Bug 155767 - Bad rendering of optional hyphen for Arabic/Persian text
Summary: Bad rendering of optional hyphen for Arabic/Persian text
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
7.6.0.0 alpha1+
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: RTL-CTL
  Show dependency treegraph
 
Reported: 2023-06-10 12:31 UTC by Hossein
Modified: 2023-06-14 22:26 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
File containing Persian text and optional hyphen (11.53 KB, application/vnd.oasis.opendocument.text)
2023-06-10 12:31 UTC, Hossein
Details
Rendering of optional hyphen for Persian text in MS Word (19.89 KB, image/png)
2023-06-10 12:51 UTC, Hossein
Details
Rendering of optional hyphen for Persian text in LibreOffice (33.71 KB, image/png)
2023-06-10 12:52 UTC, Hossein
Details
Illustration of behavior with Farsi, Hebrew, English text (25.95 KB, application/vnd.oasis.opendocument.text)
2023-06-14 22:26 UTC, Eyal Rozenberg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hossein 2023-06-10 12:31:35 UTC
Created attachment 187821 [details]
File containing Persian text and optional hyphen

Description:
Hyphenation is not used in Arabic script, but in reality many Persian documents use it instead of ZWNJ, as adding it is easier (ctrl+-) compared to ZWNJ (ctrl+shift+2) with Persian keyboard.
In MS Word, the optional hyphen breaks the words, and if the words falls at the start or middle of a row, it works similarly as ZWNJ. The difference is when it breaks the word in the end of the line, and part of it falls in the next line. In this case, it adds a hyphen (-).
In LibreOffice, the rendering of optional hyphen for Arabic script is bad, because not only the word breaking does not happen, but an extra gray area plus a hyphen is always rendered over the word that contains optional hyphen.

Steps to Reproduce:
1. Open the attached .odt file

Actual Results:
There are multiple problems:
1. Arabic words that contain optional hyphen do not break in the middle of the line.
2. A gray vertical rectangle is drawn over such words.
3. A hyphen is drawn over the word in edit mode (not visible in read-only mode).
4. When a word is broken across two lines, the non-final form of the Arabic characters are used.

Expected Results:
1. Arabic word that contain optional hyphen should break, in all places. In case a part falls into the second line, a hyphen (-) should be visible.
2. The gray vertical rectangle should not be visible.
3. The hyphen should not be drawn over the word in the middle of the line.
4. When a word is broken across two lines, the final form of the Arabic characters should be used.


Reproducible: Always


User Profile Reset: No

Additional Info:
Version: 7.5.3.2 (X86_64) / LibreOffice Community
Build ID: 9f56dff12ba03b9acd7730a5a481eea045e468f3
CPU threads: 20; OS: Windows 10.0 Build 22621; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_DE); UI: en-GB
Calc: CL threaded
Comment 1 Hossein 2023-06-10 12:51:31 UTC
Created attachment 187822 [details]
Rendering of optional hyphen for Persian text in MS Word
Comment 2 Hossein 2023-06-10 12:52:04 UTC
Created attachment 187823 [details]
Rendering of optional hyphen for Persian text in LibreOffice
Comment 3 ⁨خالد حسني⁩ 2023-06-11 06:16:47 UTC
(In reply to Hossein from comment #0)
> Created attachment 187821 [details]
> File containing Persian text and optional hyphen
> 
> Description:
> Hyphenation is not used in Arabic script

Though most languages written in Arabic script don’t indeed use hyphenation, Uighur does (in its contemporary Arabic orthography).

https://www.w3.org/TR/arab-ug-gap/#hyphenation



> but in reality many Persian
> documents use it instead of ZWNJ, as adding it is easier (ctrl+-) compared
> to ZWNJ (ctrl+shift+2) with Persian keyboard.
> In MS Word, the optional hyphen breaks the words, and if the words falls at
> the start or middle of a row, it works similarly as ZWNJ. The difference is
> when it breaks the word in the end of the line, and part of it falls in the
> next line. In this case, it adds a hyphen (-).
> In LibreOffice, the rendering of optional hyphen for Arabic script is bad,
> because not only the word breaking does not happen, but an extra gray area
> plus a hyphen is always rendered over the word that contains optional hyphen.

The LibreOffice behavior is the Unicode-compliant one. U+00AD is a control character, and should be ignored as if it weren’t there unless line breaking happens at its position (LibreOffice fails a bit short of this, if there is a ligature it prevents its formation, but this is a bug). Also, when Arabic script is hyphenated, the positional forms of the characters should be kept at the hyphenation point:
https://unicode.org/reports/tr14/#SoftHyphen

> 2. A gray vertical rectangle is drawn over such words.
> 3. A hyphen is drawn over the word in edit mode (not visible in read-only
> mode).

This is field shadings, you can disable it with View → Field Shadings.

Overall, there is no bug here, and MS Office behavior is not Unicode-complaint and we shouldn’t follow it.
Comment 4 Hossein 2023-06-11 21:09:49 UTC
> Overall, there is no bug here, and MS Office behavior is not
> Unicode-complaint and we shouldn’t follow it.
Although this may not be a Unicode-compliant behavior from MS Word, I think it is worth having to achieve compatibility with MS Word, at least for MS formats like DOC/DOCX files. It would be possible to enable/disable such a compatibility feature via Tools > Options > LibreOffice Writer > Compatibility.
Comment 5 ⁨خالد حسني⁩ 2023-06-12 09:00:38 UTC
Confirming as a feature request.
Comment 6 Eyal Rozenberg 2023-06-14 19:37:52 UTC
(In reply to ⁨خالد حسني⁩ from comment #3)
> The LibreOffice behavior is the Unicode-compliant one. U+00AD is a control
> character, and should be ignored as if it weren’t there unless line breaking
> happens at its position

But it is not ignored, it's rendered, always, as a hyphen/minus sign; and in other RTL languages also (e.g. Hebrew). Now, it's true that you don't see it in print preview when it doesn't result in an actual hyphen; but - we don't work on our documents in Print Preview; and when we edit them, or read them in editable format, so I would say it _is_ a bug to have our text filled with false hyphens. (Regardless of whether a specific language customarily uses hyphens or not.)

> (LibreOffice fails a bit short of this, if there is
> a ligature it prevents its formation, but this is a bug).

Do you believe that should be filed separately, or handled here?

> > 2. A gray vertical rectangle is drawn over such words.
> 
> This is field shadings, you can disable it with View → Field Shadings.

Well, an optional hyphen is not a field... do we do this for other characters?
Comment 7 Eyal Rozenberg 2023-06-14 22:26:11 UTC
Created attachment 187917 [details]
Illustration of behavior with Farsi, Hebrew, English text

In this file there are examples not only of 3 languages, but also of secondary issues such as ligature breakage and different behavior of soft hyphen from hard hyphen/minus and upper-hyphen/Maqaf in Hebrew.