Bug 146713 - RTL language overlap problem when mixed with LTR language
Summary: RTL language overlap problem when mixed with LTR language
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
4.1.2.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, regression
Depends on:
Blocks: RTL-Arabic-and-Farsi
  Show dependency treegraph
 
Reported: 2022-01-12 09:08 UTC by mahdisarmadirad
Modified: 2023-06-05 17:45 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
latin words overlap with persian words (26.16 KB, application/vnd.oasis.opendocument.text)
2022-01-12 09:39 UTC, mahdisarmadirad
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mahdisarmadirad 2022-01-12 09:08:05 UTC
Description:
LibreOffice can’t handle mixing RTL(persian) and LTR(en). When a document with mixed RTL and LTR languages is saved as odt file or even docx file, After reopening the file, some of the LTR words are shifted and overlap with RTL words. You have to insert and delete a space to resolve the problem, but after saving and reopening letters overlap and mix again. 

Steps to Reproduce:
1.Write a paragraph with mixed persian and en words with single spaces between them
2.Save the file as .odt file
3.Close and Reopen the file


Actual Results:
Some Latin words shift and may even overlap and mix with Persian words after reopening the saved document

Expected Results:
The software should have saved the document as it was saved (Latin words and Persian words next to each other with single spaces between them)


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 6.4.7.2
Build ID: 1:6.4.7-0ubuntu0.20.04.2
CPU threads: 4; OS: Linux 5.11; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); UI-Language: en-US
Calc: threaded
Comment 1 mahdisarmadirad 2022-01-12 09:22:51 UTC
To reproduce, the paragraph should be set to RTL.
Comment 2 Xisco Faulí 2022-01-12 09:34:50 UTC
Thank you for reporting the bug. Please attach a sample document, as this makes it easier for us to verify the bug. 
I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the requested document is provided.
(Please note that the attachment will be public, remove any sensitive information before attaching it. 
See https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F for help on how to do so.)
Comment 3 mahdisarmadirad 2022-01-12 09:39:05 UTC
Created attachment 177482 [details]
latin words overlap with persian words
Comment 4 mahdisarmadirad 2022-01-12 09:39:43 UTC
sample document attached
Comment 5 Hossein 2022-01-23 09:46:01 UTC
It is a kind of glitch. Adding/removing characters like a space in the text fixes some of the defects, but creates other defects elsewhere. The problem is present in the PDF output.

Reproduced with:

Version: 6.4.0.1
Build ID: 1b6477b31f0334bd8620a96f0aeeb449b587be9f
CPU threads: 8; OS: Linux 5.13; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); UI-Language: en-US
Calc: threaded

Version: 7.2.4.1 / LibreOffice Community
Build ID: 27d75539669ac387bb498e35313b970b7fe9c4f9
CPU threads: 8; OS: Linux 5.13; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

latest git master:

Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: 456ccf994d15a5af1ba2039cace8f0fdc886049d
CPU threads: 8; OS: Linux 5.13; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: 456ccf994d15a5af1ba2039cace8f0fdc886049d
CPU threads: 8; OS: Linux 5.13; UI render: default; VCL: x11
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

The output is OK with LO 3.5:

LibreOffice 3.5.0rc3 
Build ID: 7e68ba2-a744ebf-1f241b7-c506db1-7d53735

In Windows the bug is reproducible, but the output is different and less visible:

Version: 7.2.4.1 (x64) / LibreOffice Community
Build ID: 27d75539669ac387bb498e35313b970b7fe9c4f9
CPU threads: 32; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: en-US (en_DE); UI: en-US
Calc: threaded

Version: 7.4.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: a903d114f8d67f266e12d129333fb35c04861ae2
CPU threads: 32; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: en-US (en_DE); UI: en-US
Calc: threaded
Comment 6 Hossein 2022-01-23 14:49:09 UTC
Not reproducible in 3.6.7.2
Reproduced in 4.1.2.3
Comment 7 Hossein 2022-01-23 16:47:13 UTC
Not reproducible in 4.0.0.1.
It should have happened somewhere between 4.0.0.1 and 4.1.2.3.

I think this is the source of the problem:

commit f0393d7ff69011a16b100541ef18e5090544e4a1
Author:     Khaled Hosny <khaledhosny@eglug.org>
AuthorDate: Mon May 6 16:54:53 2013 +0200
Commit:     Khaled Hosny <khaledhosny@eglug.org>
CommitDate: Mon May 6 17:22:30 2013 +0200

bad:  refs/tags/last41onmaster
good: refs/tags/last40onmaster

$ git bisect start last41onmaster last40onmaster

# bad: [c2069a369d738078124812312d51f21ea1ce2421] source-hash-f160e4935c474a5293b3d3c11b3d538efb4767a0
# good: [2e0fa432485d1db6abd355dad8ccb06f0b97e4fb] source-hash-ce90f99a2d66c2b998ad3f9f028e2ea623a757f5
git bisect start 'last41onmaster' 'last40onmaster'
# good: [c1631ee90606d0a7928496fb9548bcd0dbe69dbf] source-hash-28fb57daa77438f5e63132d3417062a11a44461e
git bisect good c1631ee90606d0a7928496fb9548bcd0dbe69dbf
# good: [0e1044821fb18012f74e2c8fd79abba13944ac9d] source-hash-11b851d2f2f8ee6230b8c732d5b157e472c21be2
git bisect good 0e1044821fb18012f74e2c8fd79abba13944ac9d
# good: [7130a0223e15d56e6ec1f0b6bb08e0793729cb4a] source-hash-eea39218ca7c22e40b997b5efd0d9ea08e01090b
git bisect good 7130a0223e15d56e6ec1f0b6bb08e0793729cb4a
# bad: [7735ac016add3b7e4b96d6017919a828e3271a0a] source-hash-923312f67fbf120158f01c2c0e588af38fc22364
git bisect bad 7735ac016add3b7e4b96d6017919a828e3271a0a
# good: [96f14f1bf7505b59a2731dcba1615568f03fd68f] source-hash-f0393d7ff69011a16b100541ef18e5090544e4a1
git bisect good 96f14f1bf7505b59a2731dcba1615568f03fd68f
# bad: [f1fda5341557321cf4953c0d5dc6ffe262f1b545] source-hash-ee8323e2280c72eb5cc9ec0257164154b2580a78
git bisect bad f1fda5341557321cf4953c0d5dc6ffe262f1b545
# bad: [59a69f706fe3c6e178a8ecda6fe82d72b0eef465] source-hash-48ad2f61fe71edc1a8967b322d3e0f368f4be06f
git bisect bad 59a69f706fe3c6e178a8ecda6fe82d72b0eef465
# first bad commit: [59a69f706fe3c6e178a8ecda6fe82d72b0eef465] source-hash-48ad2f61fe71edc1a8967b322d3e0f368f4be06f
Comment 8 Kevin Suo 2022-01-26 01:10:08 UTC
(In reply to Hossein from comment #7)
I set this to "bibisected" as you have already bibisected it.
Comment 9 niyumard 2022-02-15 04:48:08 UTC
Reproduced with different kinds of fonts in:

Version: 7.2.5.2.0+ / LibreOffice Community
Build ID: 20(Build:2)
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: fa-IR (en_US.UTF-8); UI: en-US
Gentoo official package
Calc: threaded
Comment 10 Hossein 2022-05-12 06:21:58 UTC
Luboš, could you please take a look at this regression? It is probably in line with the previous regression (tdf#148954) that you have fixed.
Comment 11 ⁨خالد حسني⁩ 2022-08-11 20:31:54 UTC
This seems similar to bug 138199, and bug 150286. Removing the two footnotes in the first paragraph fixes the overlap in this paragraph.

I’m still skeptical that this commit what really broke it, I reverted the change in GenericSalLayout::GetTextWidth() and it made no difference.
Comment 12 Hossein 2022-08-12 08:13:53 UTC
(In reply to Khaled Hosny from comment #11)
> This seems similar to bug 138199, and bug 150286. Removing the two footnotes
> in the first paragraph fixes the overlap in this paragraph.
> 
> I’m still skeptical that this commit what really broke it, I reverted the
> change in GenericSalLayout::GetTextWidth() and it made no difference.

You're right. The problem started with one commit after the above mentioned commit:

first bad commit: [bff8fa97e16f0f06fddc5545ea36c8bd2b18a580] Enable HarfBuzz by default

commit bff8fa97e16f0f06fddc5545ea36c8bd2b18a580
Author: Khaled Hosny <khaledhosny@eglug.org>
Date:   Mon May 6 11:08:29 2013 +0200

    Enable HarfBuzz by default
Comment 13 ⁨خالد حسني⁩ 2022-08-12 15:17:11 UTC
(In reply to Hossein from comment #12)
> (In reply to Khaled Hosny from comment #11)
> > This seems similar to bug 138199, and bug 150286. Removing the two footnotes
> > in the first paragraph fixes the overlap in this paragraph.
> > 
> > I’m still skeptical that this commit what really broke it, I reverted the
> > change in GenericSalLayout::GetTextWidth() and it made no difference.
> 
> You're right. The problem started with one commit after the above mentioned
> commit:
> 
> first bad commit: [bff8fa97e16f0f06fddc5545ea36c8bd2b18a580] Enable HarfBuzz
> by default
> 
> commit bff8fa97e16f0f06fddc5545ea36c8bd2b18a580
> Author: Khaled Hosny <khaledhosny@eglug.org>
> Date:   Mon May 6 11:08:29 2013 +0200
> 
>     Enable HarfBuzz by default

I see. That makes it harder to know what changed then, this merely flips the default, the actual code was committed much earlier.
Comment 14 ⁨خالد حسني⁩ 2023-06-05 09:16:26 UTC
I spent some time debugging this but still no clue. The overlap seems to be related to font fallback, if a font that supports both Arabic and Latin is used there is no overlap. But still, the text is spilled into the margin. The footnote seems to be the source of the trouble here, but I don’t know why. My current theory is that we are calculating the wrong width for the footnote mark and this is breaking all the text after it.
Comment 15 Hossein 2023-06-05 11:00:59 UTC
(In reply to ⁨خالد حسني⁩ from comment #14)
> I spent some time debugging this but still no clue. The overlap seems to be
> related to font fallback, if a font that supports both Arabic and Latin is
> used there is no overlap. But still, the text is spilled into the margin.
> The footnote seems to be the source of the trouble here, but I don’t know
> why. My current theory is that we are calculating the wrong width for the
> footnote mark and this is breaking all the text after it.
Thank you for working on this issue.
Although the "B ..." series of fonts is a problem here because of lacking Latin characters, I also see the text overflowing beyond the margins as you have described. This is a problem that I remember from very old versions of OpenOffice, in other forms, but might be the case that only the symptoms are similar in some cases.

One thing to add is that by enabling "Toggle Formatting Marks (Ctrl+F10)", the problem goes away. I think a good way to debug this would be to dump the Writer data, and then compare the situation before and after enabling the formatting marks. More information can be found in sw/qa/extras/README on how to dump relevant data. (Thanks Caolán and Mikols for pointing to this)
Comment 16 Eyal Rozenberg 2023-06-05 17:45:26 UTC
(In reply to ⁨خالد حسني⁩ from comment #14)
> if a font that supports both Arabic and Latin is
> used there is no overlap.

I'm not sure I understand that sentence. That is, LO uses different fonts for Arabic and English characters. Do you mean, if the RTL language group font supports Arabic, and the Western font supports English, then there is no overlap? Or did you mean something else?


> The footnote seems to be the source of the trouble here, but I don’t know
> why. My current theory is that we are calculating the wrong width for the
> footnote mark and this is breaking all the text after it.

But the lines with the margin-spillage are not / not necessarily the lines with the overlap and the footnote marks. At least when I open the file.