Bug 151664 - Rendered text shifted with some Hebrew fonts - but not in PDF output
Summary: Rendered text shifted with some Hebrew fonts - but not in PDF output
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.4.1.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisectRequest
Depends on:
Blocks: RTL-CTL
  Show dependency treegraph
 
Reported: 2022-10-20 17:58 UTC by Eyal Rozenberg
Modified: 2022-12-28 21:05 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
Document exhibiting the bug (13.09 KB, application/vnd.oasis.opendocument.text)
2022-10-20 17:58 UTC, Eyal Rozenberg
Details
Document exhibiting the bug in Hebrew and Arabic (15.11 KB, application/vnd.oasis.opendocument.text)
2022-10-20 18:12 UTC, Eyal Rozenberg
Details
Corruption with Arabic characters where there are no spaces between adjacent characters (77.50 KB, image/jpeg)
2022-10-20 22:16 UTC, u34
Details
Does cursor movement, and selection, not affected by the shifting bug? (12.22 KB, image/jpeg)
2022-10-22 09:44 UTC, u34
Details
Lines mark approximate cursor positions for middle hebrew paragraph (21.31 KB, image/jpeg)
2022-10-22 22:43 UTC, u34
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2022-10-20 17:58:48 UTC
Created attachment 183166 [details]
Document exhibiting the bug

Consider the attached document. It has two copies the following "pyramid" of text:

א
א ב
א ב ג
א ב ג ד
א ב ג ד ה
א ב ג ד ה ו
א ב ג ד ה ו ז
א ב ג ד ה ו ז ח
א ב ג ד ה ו ז ח ט

... but in an RTL paragraph. This appears once in the document with the David CLM font, then again with the Noto Sans font.

Strangely, the Noto Sans lines are rendered with an increasing shift to the left from the beginning of the paragraph. That is the bug.

Now, you might say "oh, maybe it's some kind of problem with the Noto Sans font". Well, it's unlikely that should result in such a shift, but if you're not convinced...

* When we don't add a space after each letter, the lines are rendered without a shift.
* When we export the file to PDF, the shifting is gone.
* If we enter the same text in LO Impress, we don't see this shifting

Seen with:
Version: 7.5.0.0.alpha0+ / LibreOffice Community
Build ID: a09c5c69e3b5fbf448cae1d6c476f39067e40023
CPU threads: 4; OS: Linux 5.19; UI render: default; VCL: gtk3
Locale: en-IL (en_IL); UI: en-US

and:
Version: 7.4.1.2 / LibreOffice Community
Build ID: 40(Build:2)
CPU threads: 4; OS: Linux 5.19; UI render: default; VCL: gtk3
Locale: en-IL (en_IL); UI: en-US
Debian package version: 1:7.4.1-1
Comment 1 Eyal Rozenberg 2022-10-20 17:59:27 UTC
Help would be appreciated in checking whether the same occurs in Arabic.
Comment 2 Eyal Rozenberg 2022-10-20 18:12:21 UTC
Created attachment 183167 [details]
Document exhibiting the bug in Hebrew and Arabic

This happens in Arabic as well... at least for Noto Sans.
Comment 3 Maxim Monastirsky 2022-10-20 19:13:38 UTC
I believe the "Noto Sans" font doesn't include Hebrew glyphs, and there is a separate "Noto Sans Hebrew" font for that. So it's very likely that the text is rendered with some fallback font, and that you might get similar results also when setting other fonts that don't support Hebrew.
Comment 4 Eyal Rozenberg 2022-10-20 19:45:51 UTC
(In reply to Maxim Monastirsky from comment #3)
Hmm. bug 151121 comes to mind here.
Comment 5 Eyal Rozenberg 2022-10-20 20:04:14 UTC
(In reply to Maxim Monastirsky from comment #3)

So, I checked, and it seems the font used is DejaVu Sans.

However, if I just set the font directly to DejaVu Sans, I do not see the shifting.
Comment 6 u34 2022-10-20 22:06:37 UTC
1. I have downloaded attachment 183167 [details]. I can see the shifting Eyal Rozenberg, the bug submitter, describes. I know nothing about fonts. By the Font Name box of LibreOffice menus, I can tell the bug is with Noto Sans. But not with David CLM. And not with Noto Naskh Arabic.

2. For those having difficulty to see the shifting in the middle paragraphs, those in Noto Sans where there is a space between adjacent characters, I propose the following. Download attachment 183167 [details] and open it. Now place the cursor at the right most column of each paragraph. And then move the cursor one place to the left. Do that with the left arrow key. One can see that at the 7th line of the Noto Sans paragraphs, the distance of the cursor from the left most edge of the character, after the cursor has moved one place to the left, is different from what it was in the 1st line of the paragraph.

3. As far as I can tell, my Noto Sans Hebrew characters are from https://fonts.google.com/noto/fonts?noto.lang=he_Hebr&noto.continent=Asia&noto.region=IL&noto.script=Hebr. I don't know if these are DejaVu Sans, or something else. I have looked for DejaVu Sans in LibreOffice pull down Font Name menu, and didn't see such a name. Can it be I don't have such fonts installed? Anyway, I can not confirm Eayl Rozenberg claim about not having the shift when setting directly to DejaVu Sans fonts. 

4. The Arabic characters at the Noto Sans paragraph where there are no spaces looks to me corrupted. I will attach a screen shot of the 4 Noto Sans paragraphs, the way I see it. What I think is corrupted is the left bottom quarter. I have downloaded the file twice, and the same look came out with the 2nd download too. Once again, other then thinking my Noto Sans Arabic fonts are from https://fonts.google.com/noto/fonts?noto.lang=ar_Arab&noto.continent=Asia&noto.region=PS&noto.script=Arab, I don't know which fonts these are.

5. Copying from Help -> About LibreOffice -> Version Information
Version: 7.4.2.3 / LibreOffice Community
Build ID: 40(Build:3)
CPU threads: 4; OS: Linux 6.0; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
7.4.2-1
Calc: threaded
Comment 7 u34 2022-10-20 22:16:19 UTC
Created attachment 183177 [details]
Corruption with Arabic characters where there are no spaces between adjacent characters

This is how I see attachment 183167 [details]. The Arabic characters at the Noto Sans paragraph where there are no spaces looks to me corrupted. What I think is corrupted is the left bottom quarter. I have downloaded attachment 183167 [details] twice, and the same look came out with the 2nd download too. Once again, other then thinking my Noto Sans Arabic fonts are from https://fonts.google.com/noto/fonts?noto.lang=ar_Arab&noto.continent=Asia&noto.region=PS&noto.script=Arab, I don't know which fonts these are.
Comment 8 خالد حسني 2022-10-21 14:23:22 UTC
If specifying a font that have Arabic does not show a bug but a font without Arabic shows the bug, then it is likely a font fallback issue (VCL’s MulatiSalLayout class, if one is looking for code pointers, but I still have PTSD from the last time I tried to debug this code, so I’m not volunteering).
Comment 9 Eyal Rozenberg 2022-10-21 17:26:55 UTC
(In reply to u34 from comment #7)
> Created attachment 183177 [details]
> Corruption with Arabic characters where there are no spaces between adjacent
> characters
> 
> This is how I see attachment 183167 [details]. The Arabic characters at the
> Noto Sans paragraph where there are no spaces looks to me corrupted. 

Yes, it seems they're corrupted, but only after the fourth line (aleef-ba-ta-tha). Is this also your impression?
Comment 10 u34 2022-10-21 22:34:42 UTC
(In reply to خالد حسني from comment #8)
Did you imply that the fonts I am using do not have Arabic, so they are fallback fonts? Do you, and Maxim Monastirsky from comment #3, claim the Google Noto fonts do not have Arabic, neither Hebrew, glyph? How that can be when the https://fonts.google.com/noto URLs I mentioned claim the opposite? Can it be there is a problem with the package of the fonts I am using?
Eyal Rozenberg, the bug submitter, mentioned Debian package version: 1:7.4.1-1. That looks to me the next to recent Debian Bookworm (testing) libreoffice package. Can it be he is also using noto fonts from Debian? 

(In reply to Eyal Eozenberg from comment #9)
I believe attachment 183177 [details] shows the problem starting at the 3rd line, aleef-ba-ta. But I don't read Arabic. I could be wrong. What I can say more definitely is that after inserting a white space between adjacent fonts in the 3rd Arabic paragraph, it looks to me identical to the 2nd Arabic paragraph. Including the shifting, this, bug.
Comment 11 Eyal Rozenberg 2022-10-22 08:26:56 UTC
(In reply to u34 from comment #10)

Noto Sans - on my system - indeed does not have Arabic nor Hebrew glyphs - check it on your system using "Insert | Special Character..." and you'll see. There are separate fonts named "Noto Sans Hebrew", "Noto Sans Arabic" etc.

> How that can be
> when the https://fonts.google.com/noto URLs I mentioned claim the opposite?

Just look at that page, you'll see that separate languages get a separate font, e.g. Noto Kufi Arabic.

> What I can say
> more definitely is that after inserting a white space between adjacent fonts
> in the 3rd Arabic paragraph, it looks to me identical to the 2nd Arabic
> paragraph. Including the shifting, this, bug.

Yes, of course. With space = second paragraph, without space = 3rd paragraph.
Comment 12 u34 2022-10-22 09:44:15 UTC
Created attachment 183200 [details]
Does cursor movement, and selection, not affected by the shifting bug?

I don't know if the following observation is meaningful, or not. I would say the shifting bug does not affect cursor movement, and selection. Observe in the attachment, the cursor gets to the correct next to rightmost position, even though the glyph is not positioned correctly.
Comment 13 Eyal Rozenberg 2022-10-22 10:23:51 UTC
(In reply to u34 from comment #12)
> I would say the shifting bug does not affect cursor movement, 
> and selection.

The shifting affects it at least partially. Consider the second Arabic or Hebrew paragraph: If I move the cursor to the beginning of the paragraph, it's placed on where we would expect the first letter to begin, i.e. unaffected; but if we move the cursor to the end of the paragraph, it's placed after (=to the right of) the last shifted character - so it does not ignore the shifting, i.e. it is affected.
Comment 14 u34 2022-10-22 22:43:16 UTC
Created attachment 183204 [details]
Lines mark approximate cursor positions for middle hebrew paragraph

Replyig to Eyal Rozenberg from comment 13:
The shifting does not affect the cursor position. Or it does so negligibly. What gets shifted is the glyph positions. Not the cursor positions. Consider the picture in the attachment. It has the 2nd Hebrew paragraph. And some thin blue lines added by me. Those lines were drawn by hand. Which is why they are only approximation. They should have been vertical. They are not. Only close to vertical. They should have been placed at the cursor positions. They are not. I positioned the cursor. Then, in order to draw the line, I had to open a menu. Which made me use the mouse. And that moved the cursor with the mouse, and changed it shape. Never the less, I hope these lines are good approximation for what follows. Let us mark the longest blue line as line #0. It marks the cursor position when the cursor is at the right edge of the paragraph. The next blue line, line #-1, is much shorter then line 0. It is positioned where the cursor gets to when I move it from position 0 one place to the left. Line -2 is where the cursor gets when it is moved another step to the left. And so on. As you see, I had no patient, and time, to mark lines all the way further. But I move the cursor one step to the left to see the if what follows does feet. What all this summaries to? If I could move the א in between line 0 and line -1, it looks as if its width will fill the distance between those two line. Between line -1 and line -2 I would put the space that is supposedly the shift at line 0. I will then move the ב in between line -2 and -3. It looks as if its width fills the gap between this 2 lines. Try that yourself. Move the cursor slowly one position to the left, and try to estimate if the gap between its current, and previous, position is suitable for the next glyph. The way I see it, the cursor position is correct. The glyph should follow it. And they are not. It is mainly the glyph positions that get shifted. Not the cursor position.

As an aside, I tried to have the page use a built in grid. And failed. Is that a bug, or a wrong usage on my part?

As a 2nd aside, perhaps the gutter that was added to version 7.4 is the source of our trouble? This is just a guess. Just trying to see what has changed from earlier versions, where there was no such problem.
Comment 15 u34 2022-10-22 22:44:57 UTC
Replying to Eyal Rozenberg from comment 11:
In my system, Insert → Special Character... also has separate entries for fonts of different languages. And the Noto Sans looks to me a short cut for Noto Sans English. It could be just of necessity. In order not to have one extremely long entry which will confuse, and discourage, many users. It could be just a sorting aid.
Different countries seem to get their own page at fonts.google.com. But the fonts for each language remain the same. The Lebanon page, https://fonts.google.com/noto/fonts?noto.continent=Asia&noto.region=LB, and the Palestinian Territories page, https://fonts.google.com/noto/fonts?noto.continent=Asia&noto.region=PS, have the same Arabic fonts. And so does https://fonts.google.com/noto/fonts?noto.lang=en_Latn&noto.continent=Europe&noto.region=GB&noto.script=Latn and https://fonts.google.com/noto/fonts?noto.lang=en_Latn&noto.continent=Americas&noto.region=US&noto.script=Latn. I can not tell why google has so many duplicate pages.
Comment 16 u34 2022-10-23 12:18:02 UTC
The builtin grid is not perfect here. But it is easy to add, and modify. And it still comes out usefull. There are instructions for adding it at https://ask.libreoffice.org/t/creating-dot-grid-paper-in-writer/69539. Look at the 2nd answer, by EarnestAl, from Oct 2021.
This comment is a self reply to my own question at comment 14.
Comment 17 u34 2022-12-24 03:10:46 UTC
As of 

Version: 7.4.3.2 / LibreOffice Community
Build ID: 40(Build:2)
CPU threads: 2; OS: Linux 6.1; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
7.4.3-3
Calc: threaded

I do see the bug with attachment 183167 [details]. Which, for me, is using the (plain) Noto Sans fonts. But when I switch the problematic paragraphs to Noto Sans Arabic, or to Noto Sans Hebrew, as appropriate, the bug disappears. I also seem to be able to switch the bug on, and off, by changing the fonts between the language specific Noto Sans variant, and the plain, non language specific, Noto Sans fonts. Does that mean that with multi languages documents, the user will have to constantly change the fonts? 

For me, searching for a specific fonts in the drop down fonts menu was tedious. Is it because I do not know the short way to search that menu?

As an aside, there does not seem to be English Noto Sans. Nor Latin Noto Sans. Is what I called plain Noto Sans actually Latin Noto Sans?
Comment 18 Eyal Rozenberg 2022-12-24 09:01:05 UTC
I no longer see the bug with a recent nightly:

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: ad387d5b984c6666906505d25685065f710ed55d
CPU threads: 4; OS: Linux 6.0; UI render: default; VCL: gtk3
Locale: en-IL (en_IL); UI: en-US