Bug 140796 - [UI]Writer: Wrong English string for U+2060 character
Summary: [UI]Writer: Wrong English string for U+2060 character
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Localization (show other bugs)
Version:
(earliest affected)
7.0.4.2 release
Hardware: All All
: medium normal
Assignee: Julien Nabet
URL:
Whiteboard: target:7.2.0
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-04 13:00 UTC by pierre-yves samyn
Modified: 2021-03-12 08:18 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description pierre-yves samyn 2021-03-04 13:00:56 UTC
Description:
Menu Writer> Insertion> Marque de formatage> Espace sans chasse insécable

Dans les versions plus anciennes l'entrée était "Ligature sans chasse" (et non "Espace sans chasse insécable") ce qui était correct (il n'y a pas d'insertion d'espace).

Steps to Reproduce:
1.

Actual Results:
Espace sans chasse insécable

Expected Results:
Ligature sans chasse


Reproducible: Always


User Profile Reset: No



Additional Info:
Cf. https://translations.documentfoundation.org/translate/libo_ui-7-0/officecfgregistrydataorgopenofficeofficeui/fr/?q=chasse&sort_by=-priority%2Cposition&offset=3
Comment 1 Julien Nabet 2021-03-04 18:55:53 UTC
Reading a bit:
https://listarchives.libreoffice.org/fr/qa/2018/msg00140.html
not sure there's a bug here.

Sophie/Jean-Baptiste/Ysabeau: any thoughts here?
Comment 2 pierre-yves samyn 2021-03-05 09:17:43 UTC
Hi

(In reply to Julien Nabet from comment #1)
> Reading a bit:
> https://listarchives.libreoffice.org/fr/qa/2018/msg00140.html
> not sure there's a bug here.
> 

There's quite a bit of confusion here in my opinion:

- this menu did not appear with version 6 (inherited from OOo)
- there is no fine space insertion
- the objective is indeed to link as used in this FAQ

https://wiki.documentfoundation.org/Faq/Writer/147

- I'm wondering about the translation of no-width by space?

Best regards
Comment 3 sophie 2021-03-05 10:00:00 UTC
Hi Pierre-Yves, when you look at the Unicode table, the translation looks correct:
https://unicode-table.com/fr/FEFF/. Unless I'm confusing the characters or the English string is not correct: No-width No Break. There is a complete table of Unicode characters here:
http://unicode.org/L2/L2002/02368-default-ignorable.html
and I took this one FEFF  ZERO WIDTH NO-BREAK SPACE
Comment 4 pierre-yves samyn 2021-03-05 10:54:36 UTC
Hi Sophie


(In reply to sophie from comment #3)
> when you look at the Unicode table, the translation looks
> correct:
> https://unicode-table.com/fr/FEFF/. Unless I'm confusing the characters...

I didn't search in the code but I don't have the impression that it's the U+FEFF character that is used.

It seems that it is rather U+2060 (WORD JOINER)

To check it:

1. type in a document CU+FEFF<Alt>+<x> (of course <Alt>+<x> is here the keystroke)

The character is not recognized and displayed in gray. 

2. type in a document CU+2060<Alt>+<x> 

The character is recognized and displayed in gray and the ligature works as if it was inserted via the menu.

Best regards
Comment 5 pierre-yves samyn 2021-03-05 11:00:24 UTC
Verification done, it is this code that is used: U+2060 (WORD JOINER)

https://unicode-table.com/fr/2060/

ATB
Comment 6 sophie 2021-03-05 11:21:13 UTC
(In reply to pierre-yves samyn from comment #5)
> Verification done, it is this code that is used: U+2060 (WORD JOINER)
> 
> https://unicode-table.com/fr/2060/
> 
> ATB

Thanks for your research :) in that case, the English string is wrong as it suggests the wrong character. I'll change the subject of the issue and will retain 'Gluon de mots' for the FR translation.
Comment 7 pierre-yves samyn 2021-03-05 14:41:02 UTC
(In reply to sophie from comment #6)

>I'll change the subject of the issue and will
> retain 'Gluon de mots' for the FR translation.

Yes !

May I also suggest that the help should be revised accordingly ?

Furthermore, it is now also incorrect as it says "Available when complex text layout (CTL) is enabled", which no longer seems to be necessary.

Thank you very much...

Best regards
Pierre-Yves
Comment 8 Julien Nabet 2021-03-05 21:27:26 UTC
git grepping the code, it's quite bound with variables containing ZWNBSP in their name:
CHAR_ZWNBSP
SID_INSERT_ZWNBSP
InsertZWNBSP
etc.
So just changing the English string would be wrong.

Considering the only definitions that I find are:
sc/inc/global.hxx:71:const sal_Unicode CHAR_ZWNBSP   = 0x2060;
sd/source/ui/func/fubullet.cxx:53:const sal_Unicode CHAR_ZWNBSP       =   u'\x2060';
sw/inc/swtypes.hxx:172:#define CHAR_ZWNBSP         u'\x2060'

I wonder if it's not the unicode which should be changed in these for u'\xFEFF'

IMHO I think new variables should be created for u'\x2060'

Now I don't measure the impact of all these changes.

The other alternative would be to change all variable names containing ZWNBSP, finally perhaps less complicated.

Xisco/Heiko: any thoughts here?
Comment 9 pierre-yves samyn 2021-03-06 06:36:13 UTC
Hi Julien

(In reply to Julien Nabet from comment #8)
> I wonder if it's not the unicode which should be changed in these for
> u'\xFEFF'

I'm not sure I understand your comment, but please don't!

The unicode character used (u+2060) is the right one. I had checked here :

https://opengrok.libreoffice.org/xref/core/sd/source/ui/func/fubullet.cxx?r=430b3f4d

and rechecked by an html export that produces correctly &#8288;

This character corresponds to the expected functionality: Word joiner. (see the faq https://wiki.documentfoundation.org/Faq/Writer/147)

It is the label in the Formatting Mark menu (No-width No-Break) that does not match. You should use "Word Joiner".

Best regards
Pierre-Yves
Comment 10 Julien Nabet 2021-03-06 07:39:37 UTC
(In reply to pierre-yves samyn from comment #9)
> It is the label in the Formatting Mark menu (No-width No-Break) that does
> not match. You should use "Word Joiner".
In this case I suppose all the variable names containing "ZWNBSP" should be changed.
Comment 11 Ming Hua 2021-03-06 08:09:06 UTC
(In reply to Julien Nabet from comment #8)
> I wonder if it's not the unicode which should be changed in these for
> u'\xFEFF'
> 
> IMHO I think new variables should be created for u'\x2060'
In addition to Pierre-Yves's objection based on actual usage, here is some background information:

According to Wikipedia [1] and Unicode's FAQ [2], before Unicode version 3.2 (released in 2002), U+FEFF had been called ZERO WIDTH NO-BREAK SPACE and used both at beginning of a data stream to indicate byte order, or in the middle of a data stream to adjust line-breaking.

Unicode 3.2 deprecated the latter usage of U+FEFF and renamed it to BYTE ORDER MARK.  It also created U+2060 WORD JOINER for the latter usage and encourage people to use it instead of U+FEFF in the middle of a data stream.

So it seems LibreOffice is just using an deprecated nomenclature in both code and UI.  The most likely scenario is that the character inserted into documents probably was changed from U+FEFF to U+2060 some time after 2002, but the variable names and UI string were not.

1. https://en.wikipedia.org/wiki/Word_joiner and 
   https://en.wikipedia.org/wiki/Byte_order_mark
2. https://www.unicode.org/faq/utf_bom.html#bom6
Comment 12 Julien Nabet 2021-03-06 11:05:13 UTC
I gave a try with:
https://gerrit.libreoffice.org/c/core/+/112055

I'm waiting for the end of Jenkins build and some reviews of course.
Comment 13 Commit Notification 2021-03-08 07:58:49 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/b1cbc2e27a5a5434ca8097fa7586912cf0b857c4

tdf#140796: Wrong English string for U+2060 character

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 Julien Nabet 2021-03-08 07:59:03 UTC
Sophie: since it changes a string, I suppose we can't cherry-pick this one on 7.1 branch (even less in 7.0 branch) so let's put this one to FIXED.
Comment 15 Commit Notification 2021-03-08 08:55:20 UTC
Seth Chaiklin committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/help/commit/ead459c15f4eb1095d0dff0cb76611bff31d12f8

Related to: tdf#140796  No-width no break -> Word Joiner
Comment 16 sophie 2021-03-08 10:59:41 UTC
(In reply to Julien Nabet from comment #14)
> Sophie: since it changes a string, I suppose we can't cherry-pick this one
> on 7.1 branch (even less in 7.0 branch) so let's put this one to FIXED.

Yes, please, don't cherry-pick it, no need to brake the string freeze. And thanks a lot for your fix! I'm adding Olivier so he can take care of the help files.
Comment 17 pierre-yves samyn 2021-03-08 15:22:23 UTC
Hi

Wow! So quickly fixed. Many thanks to all of you for your participation. You guys rock!

Best regards
Pierre-Yves
Comment 18 Heiko Tietze 2021-03-11 15:10:56 UTC
(In reply to Commit Notification from comment #13)
> Julien Nabet committed a patch related to this issue.
> It has been pushed to "master":

Julien, please don't forget to mention this on the release notes.
Comment 19 Julien Nabet 2021-03-11 17:43:48 UTC
(In reply to Heiko Tietze from comment #18)
> ...
> Julien, please don't forget to mention this on the release notes.

=> https://wiki.documentfoundation.org/ReleaseNotes/7.2#General_improvements
Comment 20 Heiko Tietze 2021-03-12 08:18:54 UTC
(In reply to Julien Nabet from comment #19)
> => https://wiki.documentfoundation.org/ReleaseNotes/7.2#General_improvements

Thank you! Just moved it down to UNO API changes.