Bug 82689 - VIEWING: U+3000 IDEOGRAPHIC SPACE (CJK full width space) and other spaces should be rendered as non-printing characters in Writer
Summary: VIEWING: U+3000 IDEOGRAPHIC SPACE (CJK full width space) and other spaces sho...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.3.0.4 release
Hardware: Other All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsUXEval
Depends on:
Blocks: CJK
  Show dependency treegraph
 
Reported: 2014-08-16 05:32 UTC by Matthew Francis
Modified: 2016-09-16 08:28 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample document with spaces (25.60 KB, application/vnd.oasis.opendocument.text)
2014-08-16 05:34 UTC, Matthew Francis
Details
Document rendered without non-printing characters enabled (94.71 KB, image/png)
2014-08-16 05:35 UTC, Matthew Francis
Details
Document rendered with non-printing characters enabled (106.12 KB, image/png)
2014-08-16 05:35 UTC, Matthew Francis
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew Francis 2014-08-16 05:32:36 UTC
U+3000 IDEOGRAPHIC SPACE, which is a wide space used in CJK text, does not show visibly as a non-printing character when View -> Non-printing Characters is enabled in Writer.

Please see the attached document, which contains various sorts of space (ensure that View -> Non-printing Characters is enabled).

Currently, U+0020 SPACE and U+00A0 NO-BREAK SPACE are rendered correctly, but there are various other sorts of Unicode space which are not. While U+3000 IDEOGRAPHIC SPACE is almost certainly the most used of these, perhaps consideration should be given to making all on this list of space characters visible:

Non-zero-width spaces

U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+202F NARROW NO-BREAK SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE

Zero-width spaces

U+200B ZERO WIDTH SPACE
U+FEFF ZERO WIDTH NO-BREAK SPACE

(Interestingly, U+200B ZERO WIDTH SPACE shows as a sort of visible space whether or not View -> Non-printing Characters is enabled. Perhaps the handling of this should be unified with other non-printing characters?)
Comment 1 Matthew Francis 2014-08-16 05:34:30 UTC
Created attachment 104700 [details]
Sample document with spaces
Comment 2 Matthew Francis 2014-08-16 05:35:35 UTC
Created attachment 104701 [details]
Document rendered without non-printing characters enabled
Comment 3 Matthew Francis 2014-08-16 05:35:57 UTC
Created attachment 104702 [details]
Document rendered with non-printing characters enabled
Comment 4 Owen Genat (retired) 2014-08-23 15:32:38 UTC
(In reply to comment #0)
> U+3000 IDEOGRAPHIC SPACE, which is a wide space used in CJK text, does not
> show visibly as a non-printing character when View -> Non-printing
> Characters is enabled in Writer.

There is certainly no Interpunct character displayed over the Ideographic Space (U+3000) when Non-printing characters are displayed. There are possibly cultural reasons for this, given that the Middle Dot (U+00B7), which is used for Space (U+0020) and No-break Space (U+00A0), is in the Basic Latin block and some Asian scripts use a centralised dot for a full stop.

According to http://en.wikipedia.org/wiki/Interpunct these are the main Asian language preferences:

Chinese: "In Taiwan the Unicode code point U+2027, Hyphenation Point, is recommended by government as a fullwidth punctuation to separate the given name and the family name of non-Chinese." and "In Chinese, the middle dot is also fullwidth in printed matter, but the regular middle dot (·) is used in computer input, which is then rendered as fullwidth in Chinese-language fonts."

Japanese: "Interpuncts are often used to separate transcribed foreign words written in katakana. [...] the Japanese writing system usually does not use space or punctuation to separate words." and "U+30FB ・ katakana middle dot" and "U+FF65 ・ halfwidth katakana middle dot."

Korean: "Interpuncts are used in written Korean to denote a list of two or more words, more or less in the same way a slash (/) is used to juxtapose words in many other languages." and "The use of interpuncts has declined in years of digital typography and especially in place of slashes, but, in the strictest sense, a slash cannot replace a middle dot in Korean typography." and "U+318D ㆍ hangul letter araea (아래아) is used more than a middle dot when a interpunct is to be used in Korean typography."

In accordance with this I am setting the status to NEEDINFO as Asian language (l10n) experts are required to comment further on what would be considered acceptable practice.

> U+FEFF ZERO WIDTH NO-BREAK SPACE

Please note that use of U+FEFF as ZWNBSP is deprecated since 2002 (Unicode v3.2) and the Word Joiner (U+2060) is recommended to be used in its place.
Comment 5 Matthew Francis 2014-08-24 05:05:49 UTC
Thanks for the above comment.
Note that one mitigating factor to the other uses for • in CJK text is that, as of current master (4.4), the non-printing characters are displayed in blue text, rather than black, so there is some contrast there by default.

For comparison, Word for Mac 2011 appears to use a rectangle the width of the ideographic space for this case. This might be a reasonable model to follow.
Comment 6 QA Administrators 2015-04-01 14:47:43 UTC Comment hidden (obsolete)
Comment 7 Matthew Francis 2015-04-08 06:48:07 UTC
I think this has all the information it needs - passing to ux-advise.

Could the UX team please evaluate this? Thanks

-> Status: NEW
-> Severity: enhancement
-> Component: ux-advise
Comment 8 Heiko Tietze 2015-04-08 07:38:39 UTC
The purpose of showing non-printable characters is to manage the text, e.g. to distinguish between repeated carriage return and paragraph space, to discriminate between spaces and tabs, or to identify multiple spaces. 

However if the formatting information is shown directly by WYSIWYG means it makes no sense to clutter the document. In case of zero width non joiners in Farsi I understand the interaction as entering a character plus a ZWNJ which leads to a different letter - but I may be wrong. And according Owen's reply there might be some other reasons to not show special spaces. So why not having a configuration switch?

But we should confirm this by native speakers rather than UX. So I add Kevin Suo from the LO China Blog to the CC list.
Comment 9 Kevin Suo 2015-04-08 08:14:59 UTC
(In reply to Heiko Tietze from comment #8)
Sorry, I have no much idea on this issue. The only thing I can be sure is that the U+3000 (full-width space) is seldomly used in Simplified Chinese. In contrast, we use the normal space (U+0020) a lot.
Comment 10 Matthew Francis 2015-04-08 08:24:08 UTC
In my experience of Japanese documents, full width spaces are used with some regularity for formatting.

In translation (from Japanese), a frequent demand is to ensure that no full width characters remain in the target text - so being able to identify full width spaces visibly would be an advantage there.
Comment 11 Robinson Tryon (qubit) 2016-08-25 05:39:25 UTC Comment hidden (obsolete)