Created attachment 194397 [details] Arabic harakat diacritics Description: Arabic diacritics are sometimes drawn incorrectly. As an obvious example, when writing Arabic Harakat diacritics separately, they are positioned incorrectly. Steps to Reproduce: 1. Open attachment. Alternatively, write kasrah ِ with Arabic keyboard in a blank Writer document. Make sure that the paragraph is RTL and the font is "DejaVu Sans". Read more about kasrah here: https://en.wikipedia.org/wiki/Arabic_diacritics#Kasrah Actual Results: In line 1, kasrah falls outside the margin, which is an incorrect position. "DejaVu Sans" font is used in line 1. In line 2, kasrah falls in the correct position with "Noto Sans Arabic" font. In line 3, kasrah falls outside the margin, in the wrong position with additional character -. In line 4, kasrah falls in the correct position with additional character _. In line 5, kasrah falls outside the margin, which is incorrect. Expected Results: Kasrah should fall inside the margin for all the lines, in the right side of the screen. Reproducible: Always User Profile Reset: No Additional Info: Version: 24.8.0.0.alpha1+ (X86_64) / LibreOffice Community Build ID: af24f8df41a4159d16a6beb3f58b0f1cfc84c6ea CPU threads: 12; OS: Linux 6.2; UI render: default; VCL: gtk3 Locale: en-US (en_US.UTF-8); UI: en-US Calc: CL threaded
Created attachment 194398 [details] Arabic harakat diacritics (PNG) - LibreOffice 24.8 master This is what is displayed on the screen with the latest LibreOffice 24.8 dev master. Each line is marked for being correct (✅) or incorrect (❌).
Created attachment 194399 [details] Arabic harakat diacritics (PNG) - Word All 5 lines are rendered correctly.
@Jonathan: I tried to check if this is a bug from LibreOffice or Harfbuzz. See these outputs from "pango-view" utility: $ pango-view --font="Noto Sans Arabic 256" --rtl --markup --text " -ِ ب" $ pango-view --font="Noto Sans Arabic 256" --rtl --markup --text " -ِ " This is even more interesting! The color red is used in another character! $ pango-view --font="Noto Sans Arabic 256" --rtl --markup --text '-<span color="red">ِ</span>.|' With additional Arabic character, it becomes OK: pango-view --font="Noto Sans Arabic 256" --rtl --markup --text '-<span color="red">ِ</span> ب' Also, compare these two outputs from "hb-view" utility: h1) Extra Arabic character: $ hb-view /usr/share/fonts/truetype/noto/NotoSansArabic-Regular.ttf " -ِ ب" h2) Direction set to auto: $ hb-view /usr/share/fonts/truetype/noto/NotoSansArabic-Regular.ttf " -ِ " h3) Output is similar to h2: $ hb-view --direction=ltr /usr/share/fonts/truetype/noto/NotoSansArabic-Regular.ttf " -ِ " I see the same issue in gedit. If you type kasra as a first character, it falls outside the screen, but when you type additional Arabic character, it comes inside the screen.
h3 should use rtl instead of ltr, but that does change the output. $ hb-view --direction=rtl /usr/share/fonts/truetype/noto/NotoSansArabic-Regular.ttf " -ِ "
(In reply to Hossein from comment #2) > Created attachment 194399 [details] > Arabic harakat diacritics (PNG) - Word > > All 5 lines are rendered correctly. Just making sure for completeness - the Word render used the exact same font? I went through your examples, and it looks like Pango and HB are generating similar results. I think it's pretty unlikely that those two coincidentally implemented the same bug. On the other hand, a heuristic to override a font does seem like the sort of thing Microsoft would do.
Created attachment 194416 [details] Arabic harakat diacritics (PNG) - OpenOffice.org 3.2.1 This bug is not present in OpenOffice: Not reproducible with OpenOffice.org 3.2.1 (OOO320m18, Build: 9502)
(In reply to Jonathan Clark from comment #5) > (In reply to Hossein from comment #2) > > Created attachment 194399 [details] > > Arabic harakat diacritics (PNG) - Word > > > > All 5 lines are rendered correctly. > > Just making sure for completeness - the Word render used the exact same font? Yes, I have checked that again, and the same fonts are used in Word. You can directly open the same ODT file in Word. As I have also installed LibreOffice on Windows, it provides the required fonts, "DejaVu Sans" and "Noto Sans Arabic". > I went through your examples, and it looks like Pango and HB are generating > similar results. I think it's pretty unlikely that those two coincidentally > implemented the same bug. On the other hand, a heuristic to override a font > does seem like the sort of thing Microsoft would do. I am not sure if Word uses a fallback font. I tried writing a few Arabic words, and it uses the same intended fonts. Therefore, I think it Word is not using a fallback font. What is your conclusion? An upstream bug from HarfBuzz?
Created attachment 194417 [details] Dotted circle kasra in iOS CoreText
Created attachment 194418 [details] Dotted circle kasra in Windows DirectWrite
Created attachment 194419 [details] Dotted circle kasra in Linux Firefox+HB
Created attachment 194420 [details] Dotted circle Kasra in Windows Firefox+HB+DirectWrite
(In reply to Hossein from comment #7) > What is your conclusion? An upstream bug from HarfBuzz? I'm not convinced this is a bug. Just experimenting naively with the dotted circle kasra case, I could reproduce the same bad placement in 4 completely distinct shapers: DirectWrite, CoreText, HarfBuzz, and Pango. I don't know why they're doing it, and I'm not saying this is what anyone should want. But with so many implementations doing the same thing, it's reasonable to guess this is intentional.
(In reply to Jonathan Clark from comment #12) > (In reply to Hossein from comment #7) > > What is your conclusion? An upstream bug from HarfBuzz? > > I'm not convinced this is a bug. I think this is indeed a bug, although the root cause needs to be discussed. I can provide some reasons: 1) As an RTL/CTL user which uses Arabic script, I find it wrong. 2) Falling out of the margin/screen: When you type kasrah (or other harakat diacritics), it goes out of the margin (in case of gedit, it goes out of screen). 3) Glitch: When you type another Arabic character, it comes back to the margin (screen). This is not the kind of behavior I expect when I type some text. 4) Inconsistency: It can't be that both rendering alternatives are correct. 5) Causing issues with formatting: In this example, even color is attached incorrectly: $ pango-view --font="Noto Sans Arabic 256" --rtl --markup --text '-<span color="red">ِ</span>.|' > Just experimenting naively with the dotted circle kasra case, I could > reproduce the same bad placement in 4 completely distinct shapers: > DirectWrite, CoreText, HarfBuzz, and Pango. > > I don't know why they're doing it, and I'm not saying this is what anyone > should want. But with so many implementations doing the same thing, it's > reasonable to guess this is intentional. I suspect that the direction is calculated incorrectly.
> $ pango-view --font="Noto Sans Arabic 256" --rtl --markup --text '-<span color="red">ِ</span>.|' > I suspect that the direction is calculated incorrectly. If you try the above command without --rtl, the result is different, and the color is applied correctly. $ pango-view --font="Noto Sans Arabic 256" --markup --text '-<span color="red">ِ</span>.|' This is related to the base direction of text. From the pango-view man page: --rtl Set base direction to right-to-left Using fribidi utility, I get these outputs. Please note that the display here is a little different, so you have to run it in terminal: $ fribidi -ِ ب ب -ِ -ِ -ِ $ fribidi --rtl -ِ -ِ-- ِ ِ Therefore, forcing base direction to RTL when user sets the RTL direction for the paragraph should fix the issue. The reason that HarfBuzz hb-view utility (or the library itself) ignores --rtl in this case is unclear to me. This option works with some other cases: For examples, the output of these two commands are different: $ ./util/hb-view --direction=rtl /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf "a ب" $ ./util/hb-view --direction=ltr /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf "a ب" This may also help in debugging: $ fribidi --levels ِ ِ 0 -ِ -ِ 0 0 $ fribidi --rtl --levels ِ ِ 1 -ِ -ِ 1 1 It is worth mentioning that according to tdf#155470, in LibreOffice there are limitations in embedding directions, for example LTR inside RTL, and things like that.
Khaled, wondering if you have some insights on this?
First, let me add a(n unsurprising) data point: Seeing the same behavior with Hebrew diacritics (Nikkud), including magical diacritic movement when an additional character is typed. More about this later - perhaps even a video... (In reply to Hossein from comment #13) Agree with you, Hossein on points 1 through 4 Both for Arabic and for Hebrew. In point 5 I'm not exactly sure what is the expected-vs-actual.
Seeing this in textboxes and in Impress as well, so expanding scope.
(In reply to Hossein from comment #0) But I must ask: Are we seeing this only in corner/weird cases, i.e. harakaat without a character, or placed onto a hyphen? Hossein, you said: > Arabic diacritics are sometimes drawn incorrectly. As an obvious example, > when writing Arabic Harakat diacritics separately What are other scenarios in which we see this? Can we find one that's more common?
Combining marks need a base to be placed over. When inserting combining marks without such a base, it can lead to inconsistencies like what is teported here. If the combining mark is the very first character of the text, HarfBuzz will insert a dotted circle as a base (if the font has it). Soace is also a valid base and can be used when marks need to be used standalone. The test should ve repeated with space inserted before the marks, but it is very likely any mispositioning that would happen in this case will be due to font limitations (very few fonts intentionally support placing combining marks over non-letters).