Bug 82689 - VIEWING: U+3000 IDEOGRAPHIC SPACE (CJK full width space) and other spaces should be rendered as non-printing characters in Writer
Summary: VIEWING: U+3000 IDEOGRAPHIC SPACE (CJK full width space) and other spaces sho...
Status: RESOLVED INSUFFICIENTDATA
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.3.0.4 release
Hardware: Other All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: CJK
  Show dependency treegraph
 
Reported: 2014-08-16 05:32 UTC by Matthew Francis
Modified: 2020-12-09 03:41 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample document with spaces (25.60 KB, application/vnd.oasis.opendocument.text)
2014-08-16 05:34 UTC, Matthew Francis
Details
Document rendered without non-printing characters enabled (94.71 KB, image/png)
2014-08-16 05:35 UTC, Matthew Francis
Details
Document rendered with non-printing characters enabled (106.12 KB, image/png)
2014-08-16 05:35 UTC, Matthew Francis
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew Francis 2014-08-16 05:32:36 UTC
U+3000 IDEOGRAPHIC SPACE, which is a wide space used in CJK text, does not show visibly as a non-printing character when View -> Non-printing Characters is enabled in Writer.

Please see the attached document, which contains various sorts of space (ensure that View -> Non-printing Characters is enabled).

Currently, U+0020 SPACE and U+00A0 NO-BREAK SPACE are rendered correctly, but there are various other sorts of Unicode space which are not. While U+3000 IDEOGRAPHIC SPACE is almost certainly the most used of these, perhaps consideration should be given to making all on this list of space characters visible:

Non-zero-width spaces

U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+202F NARROW NO-BREAK SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE

Zero-width spaces

U+200B ZERO WIDTH SPACE
U+FEFF ZERO WIDTH NO-BREAK SPACE

(Interestingly, U+200B ZERO WIDTH SPACE shows as a sort of visible space whether or not View -> Non-printing Characters is enabled. Perhaps the handling of this should be unified with other non-printing characters?)
Comment 1 Matthew Francis 2014-08-16 05:34:30 UTC
Created attachment 104700 [details]
Sample document with spaces
Comment 2 Matthew Francis 2014-08-16 05:35:35 UTC
Created attachment 104701 [details]
Document rendered without non-printing characters enabled
Comment 3 Matthew Francis 2014-08-16 05:35:57 UTC
Created attachment 104702 [details]
Document rendered with non-printing characters enabled
Comment 4 Owen Genat (retired) 2014-08-23 15:32:38 UTC
(In reply to comment #0)
> U+3000 IDEOGRAPHIC SPACE, which is a wide space used in CJK text, does not
> show visibly as a non-printing character when View -> Non-printing
> Characters is enabled in Writer.

There is certainly no Interpunct character displayed over the Ideographic Space (U+3000) when Non-printing characters are displayed. There are possibly cultural reasons for this, given that the Middle Dot (U+00B7), which is used for Space (U+0020) and No-break Space (U+00A0), is in the Basic Latin block and some Asian scripts use a centralised dot for a full stop.

According to http://en.wikipedia.org/wiki/Interpunct these are the main Asian language preferences:

Chinese: "In Taiwan the Unicode code point U+2027, Hyphenation Point, is recommended by government as a fullwidth punctuation to separate the given name and the family name of non-Chinese." and "In Chinese, the middle dot is also fullwidth in printed matter, but the regular middle dot (·) is used in computer input, which is then rendered as fullwidth in Chinese-language fonts."

Japanese: "Interpuncts are often used to separate transcribed foreign words written in katakana. [...] the Japanese writing system usually does not use space or punctuation to separate words." and "U+30FB ・ katakana middle dot" and "U+FF65 ・ halfwidth katakana middle dot."

Korean: "Interpuncts are used in written Korean to denote a list of two or more words, more or less in the same way a slash (/) is used to juxtapose words in many other languages." and "The use of interpuncts has declined in years of digital typography and especially in place of slashes, but, in the strictest sense, a slash cannot replace a middle dot in Korean typography." and "U+318D ㆍ hangul letter araea (아래아) is used more than a middle dot when a interpunct is to be used in Korean typography."

In accordance with this I am setting the status to NEEDINFO as Asian language (l10n) experts are required to comment further on what would be considered acceptable practice.

> U+FEFF ZERO WIDTH NO-BREAK SPACE

Please note that use of U+FEFF as ZWNBSP is deprecated since 2002 (Unicode v3.2) and the Word Joiner (U+2060) is recommended to be used in its place.
Comment 5 Matthew Francis 2014-08-24 05:05:49 UTC
Thanks for the above comment.
Note that one mitigating factor to the other uses for • in CJK text is that, as of current master (4.4), the non-printing characters are displayed in blue text, rather than black, so there is some contrast there by default.

For comparison, Word for Mac 2011 appears to use a rectangle the width of the ideographic space for this case. This might be a reasonable model to follow.
Comment 6 QA Administrators 2015-04-01 14:47:43 UTC Comment hidden (obsolete)
Comment 7 Matthew Francis 2015-04-08 06:48:07 UTC
I think this has all the information it needs - passing to ux-advise.

Could the UX team please evaluate this? Thanks

-> Status: NEW
-> Severity: enhancement
-> Component: ux-advise
Comment 8 Heiko Tietze 2015-04-08 07:38:39 UTC
The purpose of showing non-printable characters is to manage the text, e.g. to distinguish between repeated carriage return and paragraph space, to discriminate between spaces and tabs, or to identify multiple spaces. 

However if the formatting information is shown directly by WYSIWYG means it makes no sense to clutter the document. In case of zero width non joiners in Farsi I understand the interaction as entering a character plus a ZWNJ which leads to a different letter - but I may be wrong. And according Owen's reply there might be some other reasons to not show special spaces. So why not having a configuration switch?

But we should confirm this by native speakers rather than UX. So I add Kevin Suo from the LO China Blog to the CC list.
Comment 9 Kevin Suo 2015-04-08 08:14:59 UTC
(In reply to Heiko Tietze from comment #8)
Sorry, I have no much idea on this issue. The only thing I can be sure is that the U+3000 (full-width space) is seldomly used in Simplified Chinese. In contrast, we use the normal space (U+0020) a lot.
Comment 10 Matthew Francis 2015-04-08 08:24:08 UTC
In my experience of Japanese documents, full width spaces are used with some regularity for formatting.

In translation (from Japanese), a frequent demand is to ensure that no full width characters remain in the target text - so being able to identify full width spaces visibly would be an advantage there.
Comment 11 Robinson Tryon (qubit) 2016-08-25 05:39:25 UTC Comment hidden (obsolete)
Comment 12 Jun Nogata 2020-05-08 10:50:27 UTC
Hello. I am Japanese.

(In reply to Heiko Tietze from comment #8)
> But we should confirm this by native speakers rather than UX. So I add Kevin
> Suo from the LO China Blog to the CC list.

I want full-width spaces to be displayed.

In Japanese, full-width spaces are used very often.

In Japanese paragraphs, a space is inserted at the beginning of the line.[1] This should use indentation. However, there are users who use full-width spaces as a Japanese convention.

* [1] https://en.wikipedia.org/wiki/Japanese_punctuation#Space

It's very difficult to tell a sentence from a mix of full-width spaces and spaces.
I'd like to see a full-width space to distinguish between the two.

Microsoft Word Japanese version displays full-width spaces.

A tweet from Japanese users being in trouble.

* https://twitter.com/gootalasavolsky/status/1145638413065842689
* https://twitter.com/Rarry_/status/844949678131101700
Comment 13 Heiko Tietze 2020-05-11 13:03:24 UTC
(In reply to nogajun from comment #12)
> In Japanese, full-width spaces are used very often.

Is the special character dialog/widget with the list of recently used characters a solution for you? Would be nice to have shortcuts, see bug 109215.

We introduced U+200A with big 121596 but I'm a bit afraid of too many space variants. If we implement U+3000 space, it should be for Japanese only.
Comment 14 Tomaz Vajngerl 2020-05-11 13:24:21 UTC
(In reply to Heiko Tietze from comment #13)
> (In reply to nogajun from comment #12)
> > In Japanese, full-width spaces are used very often.
> 
> Is the special character dialog/widget with the list of recently used
> characters a solution for you? Would be nice to have shortcuts, see bug
> 109215.
> 
> We introduced U+200A with big 121596 but I'm a bit afraid of too many space
> variants. If we implement U+3000 space, it should be for Japanese only.

This is about showing non-printing characters with "Formatting Marks" ctrl+f10 functionality. Currently we don't show any other space as the "ASCII" 0x20 space as a dot. 

I looked into this a bit and it is not trivial. Currently we substitute spaces with dots as they are the same or similar width, but for other spaces, we will probably need to do some custom drawing.
Comment 15 QA Administrators 2020-11-08 04:19:33 UTC Comment hidden (obsolete)
Comment 16 QA Administrators 2020-12-09 03:40:57 UTC
Dear Matthew Francis,

Please read this message in its entirety before proceeding.

Your bug report is being closed as INSUFFICIENTDATA due to inactivity and
a lack of information which is needed in order to accurately
reproduce and confirm the problem. We encourage you to retest
your bug against the latest release. If the issue is still
present in the latest stable release, we need the following
information (please ignore any that you've already provided):

a) Provide details of your system including your operating
   system and the latest version of LibreOffice that you have
   confirmed the bug to be present

b) Provide easy to reproduce steps – the simpler the better

c) Provide any test case(s) which will help us confirm the problem

d) Provide screenshots of the problem if you think it might help

e) Read all comments and provide any requested information

Once all of this is done, please set the bug back to UNCONFIRMED
and we will attempt to reproduce the issue. Please do not:

a) respond via email 

b) update the version field in the bug or any of the other details
   on the top section of our bug tracker

Warm Regards,
QA Team

MassPing-NeedInfo-FollowUp