Bug 129709 - Incorrect hyphenation of marathi words having Zero Width Joiner
Summary: Incorrect hyphenation of marathi words having Zero Width Joiner
Status: RESOLVED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
6.4.0.1 rc
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-12-31 13:35 UTC by Shantanu
Modified: 2020-08-31 04:11 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
zero width joiner does not allow correct hyphenation (7.78 KB, image/png)
2019-12-31 13:38 UTC, Shantanu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Shantanu 2019-12-31 13:35:22 UTC
Description:
Libre office writer does not hyphenate the words correctly if zero-width-joiner is used anywhere in the word written in "Marathi" language. As per the validation script, the word "फक्‍तवित्‍तवान" can be broken at 3 places, but it is breaking at unexpected place in Writer as shown in the attached image.

Steps to Reproduce:
1) Check config rule:
# docker run  --disable-content-trust  -it shantanuo/hyphenate grep -iA1 'joiner' ../dicts/hyph_mr_IN.dic

% Do not break either side of ZERO-WIDTH JOINER  (U+200D)
2‍2
% Break after ZERO-WIDTH NON JOINER  (U+200C)
1

2) Run this docker command and check the output. Validation script has not broken the joiner but non-joiner is broken as expected in case of last 2 lines:
# docker run -it shantanuo/hyphenate ./example ../dicts/hyph_mr_IN.dic /hi_sample.text
वि=त्त
वि=त्‍त
वि=त्‌त
उप=क्र=मा=वर=च्या=हीच
उप=क्र=मा=वर=च्या=ही=बद्दल=चा
वि=त्‌=त=वान
वि=त्‍त=वान

3) Download and install the Marathi spell check extension for Libre office from...

https://extensions.libreoffice.org/extensions/marathi-spellchecker/1.8

Start writer and type the word वित्‍तवान at the end of the line and notice the break between वित्‌-तवान

Actual Results:
Probably wrong:
वित्‌=तवान

Expected Results:
वि=त्‍तवान
OR
वित्‍त=वान
OR
वित्‍तवा=न


Reproducible: Always


User Profile Reset: Yes



Additional Info:
Comment 1 Shantanu 2019-12-31 13:38:38 UTC
Created attachment 156862 [details]
zero width joiner does not allow correct hyphenation

Word is broken at क् and त् incorrectly
Comment 2 Shantanu 2020-04-15 12:53:09 UTC
Let me put it another way, Note only the 4 lines from this page...

https://gitlab.com/smc/hyphenation/blob/master/mr_IN/hyph_mr_IN.dic

% Do not break either side of ZERO-WIDTH JOINER  (U+200D)
2‍2
% Break after ZERO-WIDTH NON JOINER  (U+200C)
‌1

Is the rule mentioned above correctly set?
Comment 3 Xisco Faulí 2020-05-13 11:46:38 UTC
@Julien, any idea about this issue ?
Comment 4 Julien Nabet 2020-05-13 12:19:44 UTC
I know nothing about hyphenation=> uncc myself.
Comment 5 Buovjaga 2020-08-30 16:05:14 UTC
Shantanu: isn't the problem outside of LibreOffice? Shouldn't you report to the extension maintainer?
Comment 6 Shantanu 2020-08-31 04:11:00 UTC
I guess you are correct. The extension owner is the right person to resolve this. Closing.