Bug 161303 - Bad positioning of letter-less Arabic & Hebrew diacritics
Summary: Bad positioning of letter-less Arabic & Hebrew diacritics
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
5.3.0.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Arabic-and-Farsi Hebrew Diacritics RTL
  Show dependency treegraph
 
Reported: 2024-05-28 11:35 UTC by Hossein
Modified: 2024-09-15 01:11 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Arabic harakat diacritics (10.99 KB, application/vnd.oasis.opendocument.text)
2024-05-28 11:35 UTC, Hossein
Details
Arabic harakat diacritics (PNG) - LibreOffice 24.8 master (19.72 KB, image/png)
2024-05-28 11:48 UTC, Hossein
Details
Arabic harakat diacritics (PNG) - Word (8.07 KB, image/png)
2024-05-28 11:59 UTC, Hossein
Details
Arabic harakat diacritics (PNG) - OpenOffice.org 3.2.1 (8.72 KB, image/png)
2024-05-29 08:19 UTC, Hossein
Details
Dotted circle kasra in iOS CoreText (5.64 KB, image/png)
2024-05-29 09:05 UTC, Jonathan Clark
Details
Dotted circle kasra in Windows DirectWrite (3.20 KB, image/png)
2024-05-29 09:06 UTC, Jonathan Clark
Details
Dotted circle kasra in Linux Firefox+HB (10.84 KB, image/png)
2024-05-29 09:07 UTC, Jonathan Clark
Details
Dotted circle Kasra in Windows Firefox+HB+DirectWrite (9.90 KB, image/png)
2024-05-29 09:08 UTC, Jonathan Clark
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hossein 2024-05-28 11:35:09 UTC
Created attachment 194397 [details]
Arabic harakat diacritics

Description:
Arabic diacritics are sometimes drawn incorrectly. As an obvious example, when writing Arabic Harakat diacritics separately, they are positioned incorrectly.

Steps to Reproduce:
1. Open attachment.
Alternatively, write kasrah ِ  with Arabic keyboard in a blank Writer document. Make sure that the paragraph is RTL and the font is "DejaVu Sans". Read more about kasrah here:
https://en.wikipedia.org/wiki/Arabic_diacritics#Kasrah

Actual Results:
In line 1, kasrah falls outside the margin, which is an incorrect position. "DejaVu Sans" font is used in line 1.
In line 2, kasrah falls in the correct position with "Noto Sans Arabic" font.
In line 3, kasrah falls outside the margin, in the wrong position with additional character -. 
In line 4, kasrah falls in the correct position with additional character _.
In line 5, kasrah falls outside the margin, which is incorrect.

Expected Results:
Kasrah should fall inside the margin for all the lines, in the right side of the screen.

Reproducible: Always


User Profile Reset: No

Additional Info:
Version: 24.8.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: af24f8df41a4159d16a6beb3f58b0f1cfc84c6ea
CPU threads: 12; OS: Linux 6.2; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: CL threaded
Comment 1 Hossein 2024-05-28 11:48:34 UTC
Created attachment 194398 [details]
Arabic harakat diacritics (PNG) - LibreOffice 24.8 master

This is what is displayed on the screen with the latest LibreOffice 24.8 dev master.
Each line is marked for being correct (✅) or incorrect (❌).
Comment 2 Hossein 2024-05-28 11:59:18 UTC
Created attachment 194399 [details]
Arabic harakat diacritics (PNG) - Word

All 5 lines are rendered correctly.
Comment 3 Hossein 2024-05-28 12:14:57 UTC
@Jonathan:
I tried to check if this is a bug from LibreOffice or Harfbuzz.

See these outputs from "pango-view" utility:

$ pango-view --font="Noto Sans Arabic 256" --rtl --markup --text " -ِ  ب"

$ pango-view --font="Noto Sans Arabic 256" --rtl --markup --text " -ِ "

This is even more interesting! The color red is used in another character!

$ pango-view --font="Noto Sans Arabic 256" --rtl --markup --text '-<span color="red">ِ</span>.|'

With additional Arabic character, it becomes OK:
pango-view --font="Noto Sans Arabic 256" --rtl --markup --text '-<span color="red">ِ</span> ب'

Also, compare these two outputs from "hb-view" utility:

h1) Extra Arabic character:
$ hb-view /usr/share/fonts/truetype/noto/NotoSansArabic-Regular.ttf " -ِ  ب"

h2) Direction set to auto:
$ hb-view /usr/share/fonts/truetype/noto/NotoSansArabic-Regular.ttf " -ِ "

h3) Output is similar to h2:
$ hb-view --direction=ltr /usr/share/fonts/truetype/noto/NotoSansArabic-Regular.ttf " -ِ "

I see the same issue in gedit. If you type kasra as a first character, it falls outside the screen, but when you type additional Arabic character, it comes inside the screen.
Comment 4 Hossein 2024-05-28 12:47:05 UTC
h3 should use rtl instead of ltr, but that does change the output.

$ hb-view --direction=rtl /usr/share/fonts/truetype/noto/NotoSansArabic-Regular.ttf " -ِ "
Comment 5 Jonathan Clark 2024-05-29 04:06:13 UTC
(In reply to Hossein from comment #2)
> Created attachment 194399 [details]
> Arabic harakat diacritics (PNG) - Word
> 
> All 5 lines are rendered correctly.

Just making sure for completeness - the Word render used the exact same font?


I went through your examples, and it looks like Pango and HB are generating similar results. I think it's pretty unlikely that those two coincidentally implemented the same bug. On the other hand, a heuristic to override a font does seem like the sort of thing Microsoft would do.
Comment 6 Hossein 2024-05-29 08:19:33 UTC
Created attachment 194416 [details]
Arabic harakat diacritics (PNG) - OpenOffice.org 3.2.1

This bug is not present in OpenOffice:

Not reproducible with OpenOffice.org 3.2.1 (OOO320m18, Build: 9502)
Comment 7 Hossein 2024-05-29 08:20:50 UTC
(In reply to Jonathan Clark from comment #5)
> (In reply to Hossein from comment #2)
> > Created attachment 194399 [details]
> > Arabic harakat diacritics (PNG) - Word
> > 
> > All 5 lines are rendered correctly.
> 
> Just making sure for completeness - the Word render used the exact same font?
Yes, I have checked that again, and the same fonts are used in Word. You can directly open the same ODT file in Word. As I have also installed LibreOffice on Windows, it provides the required fonts, "DejaVu Sans" and "Noto Sans Arabic".
> I went through your examples, and it looks like Pango and HB are generating
> similar results. I think it's pretty unlikely that those two coincidentally
> implemented the same bug. On the other hand, a heuristic to override a font
> does seem like the sort of thing Microsoft would do.
I am not sure if Word uses a fallback font. I tried writing a few Arabic words, and it uses the same intended fonts. Therefore, I think it Word is not using a fallback font.

What is your conclusion? An upstream bug from HarfBuzz?
Comment 8 Jonathan Clark 2024-05-29 09:05:41 UTC
Created attachment 194417 [details]
Dotted circle kasra in iOS CoreText
Comment 9 Jonathan Clark 2024-05-29 09:06:07 UTC
Created attachment 194418 [details]
Dotted circle kasra in Windows DirectWrite
Comment 10 Jonathan Clark 2024-05-29 09:07:38 UTC
Created attachment 194419 [details]
Dotted circle kasra in Linux Firefox+HB
Comment 11 Jonathan Clark 2024-05-29 09:08:07 UTC
Created attachment 194420 [details]
Dotted circle Kasra in Windows Firefox+HB+DirectWrite
Comment 12 Jonathan Clark 2024-05-29 09:30:02 UTC
(In reply to Hossein from comment #7)
> What is your conclusion? An upstream bug from HarfBuzz?

I'm not convinced this is a bug.

Just experimenting naively with the dotted circle kasra case, I could reproduce the same bad placement in 4 completely distinct shapers: DirectWrite, CoreText, HarfBuzz, and Pango.

I don't know why they're doing it, and I'm not saying this is what anyone should want. But with so many implementations doing the same thing, it's reasonable to guess this is intentional.
Comment 13 Hossein 2024-05-29 09:57:41 UTC
(In reply to Jonathan Clark from comment #12)
> (In reply to Hossein from comment #7)
> > What is your conclusion? An upstream bug from HarfBuzz?
> 
> I'm not convinced this is a bug.
I think this is indeed a bug, although the root cause needs to be discussed.

I can provide some reasons:

1) As an RTL/CTL user which uses Arabic script, I find it wrong.

2) Falling out of the margin/screen: When you type kasrah (or other harakat diacritics), it goes out of the margin (in case of gedit, it goes out of screen).

3) Glitch: When you type another Arabic character, it comes back to the margin (screen). This is not the kind of behavior I expect when I type some text.

4) Inconsistency: It can't be that both rendering alternatives are correct.

5) Causing issues with formatting: In this example, even color is attached incorrectly:
$ pango-view --font="Noto Sans Arabic 256" --rtl --markup --text '-<span color="red">ِ</span>.|'

> Just experimenting naively with the dotted circle kasra case, I could
> reproduce the same bad placement in 4 completely distinct shapers:
> DirectWrite, CoreText, HarfBuzz, and Pango.
> 
> I don't know why they're doing it, and I'm not saying this is what anyone
> should want. But with so many implementations doing the same thing, it's
> reasonable to guess this is intentional.
I suspect that the direction is calculated incorrectly.
Comment 14 Hossein 2024-05-29 12:20:22 UTC
> $ pango-view --font="Noto Sans Arabic 256" --rtl --markup --text '-<span color="red">ِ</span>.|'
> I suspect that the direction is calculated incorrectly.
If you try the above command without --rtl, the result is different, and the color is applied correctly.

$ pango-view --font="Noto Sans Arabic 256" --markup --text '-<span color="red">ِ</span>.|'

This is related to the base direction of text. From the pango-view man page:

  --rtl  Set base direction to right-to-left

Using fribidi utility, I get these outputs. Please note that the display here is a little different, so you have to run it in terminal:
$ fribidi 
-ِ ب
                                                                             ب -ِ
-ِ
-ِ

$ fribidi --rtl
-ِ 
                                                                             -ِ--
ِ
                                                                                ِ

Therefore, forcing base direction to RTL when user sets the RTL direction for the paragraph should fix the issue. The reason that HarfBuzz hb-view utility (or the library itself) ignores --rtl in this case is unclear to me. This option works with some other cases:

For examples, the output of these two commands are different:
$ ./util/hb-view --direction=rtl /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf "a ب"

$ ./util/hb-view --direction=ltr /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf "a ب"

This may also help in debugging:

$ fribidi --levels
ِ 
ِ
0 
-ِ
-ِ
0 0 

$ fribidi --rtl --levels
 ِ
                                                                                ِ
1 
-ِ
                                                                               -ِ
1 1 

It is worth mentioning that according to tdf#155470, in LibreOffice there are limitations in embedding directions, for example LTR inside RTL, and things like that.
Comment 15 Stéphane Guillou (stragu) 2024-06-12 12:36:06 UTC
Khaled, wondering if you have some insights on this?
Comment 16 Eyal Rozenberg 2024-07-15 18:18:41 UTC
First, let me add a(n unsurprising) data point: Seeing the same behavior with Hebrew diacritics (Nikkud), including magical diacritic movement when an additional character is typed. More about this later - perhaps even a video...

(In reply to Hossein from comment #13)

Agree with you, Hossein on points 1 through 4 Both for Arabic and for Hebrew. In point 5 I'm not exactly sure what is the expected-vs-actual.
Comment 17 Eyal Rozenberg 2024-08-19 13:28:00 UTC
Seeing this in textboxes and in Impress as well, so expanding scope.
Comment 18 Eyal Rozenberg 2024-09-13 19:55:53 UTC
(In reply to Hossein from comment #0)

But I must ask: Are we seeing this only in corner/weird cases, i.e. harakaat without a character, or placed onto a hyphen? Hossein, you said:

> Arabic diacritics are sometimes drawn incorrectly. As an obvious example,
> when writing Arabic Harakat diacritics separately

What are other scenarios in which we see this? Can we find one that's more common?
Comment 19 ⁨خالد حسني⁩ 2024-09-15 01:11:15 UTC
Combining marks need a base to be placed over. When inserting combining marks without such a base, it can lead to inconsistencies like what is teported here. If the combining mark is the very first character of the text, HarfBuzz will insert a dotted circle as a base (if the font has it). Soace is also a valid base and can be used when marks need to be used standalone. The test should ve repeated with space inserted before the marks, but it is very likely any mispositioning that would happen in this case will be due to font limitations (very few fonts intentionally support placing combining marks over non-letters).