59447 – Allow users to view word-breaks created by ICU Breakiterator

Bug 59447 - Allow users to view word-breaks created by ICU Breakiterator

Summary: Allow users to view word-breaks created by ICU Breakiterator

Status:	NEW

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Writer (show other bugs)
Version: (earliest affected)	3.6.4.3 release
Hardware:	Other All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:	BSA
Keywords:

Depends on:
Blocks:	Formatting-Mark ICU Word-Line-Break
	Show dependency tree / graph

Reported:	2013-01-16 04:17 UTC by Nathan Wells
Modified:	2024-05-18 16:30 UTC (History)
CC List:	0 users

See Also:	59448 52020
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Nathan Wells 2013-01-16 04:17:55 UTC

Problem description: 
Currently for languages that do not have spaces between words (like Thai, Khmer) when the ICU breakiterator is used to break words, end users cannot see where the breakiterator is breaking the words. This creates a problem because the breakiterator is not 100% accurate for Thai or Khmer, so users cannot easily manually add Unicode joiners to re-join words that have been incorrectly split. 


Desired behavior:
I suggest that the zero-width spaces added by the ICU breakiterator be made visible when a user has turned on View->Field Shadings
This way users can see exactly what is happening and easily correct any problems with the automatic word-breaker.
              
Operating System: All
Version: 3.6.4.3 release

Comment 1 Florian Reisinger 2013-04-21 14:52:25 UTC

Seems to be a valid enhancement

Comment 2 EricP 2013-10-16 06:33:25 UTC

I'm not certain, but I don't think the ICU BreakIterator actually adds ZWSP to the text. Rather, it simply decides where to break the text when outputting, without actually inserting new characters into the text stream.

But this has gotten me thinking:
I'm unhappy with the ICU BreakIterator for Khmer because it creates chaos when non-Khmer (i.e. minority) languages are written in Khmer script. Before the ICU BreakIterator was enabled for Khmer in LO 3.6, minority languages carefully typed with ZWSP between words did line breaking perfectly.

It's understandable that Cambodians don't like typing ZWSP, but what if we inserted ZWSP automatically, using an interface similar to the predictive text input on an iPhone? If this were done, then line breaking, spell checking, word counts, etc. would be greatly simplified.

Comment 3 Nathan Wells 2015-08-17 02:59:16 UTC

@livingfield
Something like that might work, but for now I would request this feature goes through as the years pass quickly without any resolution :)