Bug 56408 - Writer always breaks lines at text direction change
Summary: Writer always breaks lines at text direction change
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected)
Inherited From OOo
Hardware: All All
: high major
Assignee: Not Assigned
Keywords: text:rtl
: 65840 146710 (view as bug list)
Depends on:
Blocks: RTL-CTL
  Show dependency treegraph
Reported: 2012-10-25 18:25 UTC by Lior Kaplan
Modified: 2022-09-25 21:38 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:
Regression By:

Mixed text test document (15.82 KB, application/vnd.oasis.opendocument.text)
2012-10-25 18:25 UTC, Lior Kaplan
doc before the modfication (112.52 KB, image/png)
2016-02-22 19:07 UTC, Nusaiba Al Kindi
doc after the modfication (112.38 KB, image/png)
2016-02-22 19:08 UTC, Nusaiba Al Kindi
Bug 56408 still happens in version 5.4.2 (128.51 KB, image/png)
2017-11-02 08:27 UTC, Omer Zak

Note You need to log in before you can comment on or make changes to this bug.
Description Lior Kaplan 2012-10-25 18:25:01 UTC
Created attachment 69087 [details]
Mixed text test document

When mixing text from English (Latin) and Hebrew/Arabic (RTL languages) brackets aren't handled correctly, and the leading bracket isn't handled as part of the word. So during a word wrap the leading bracket stays in the line and the word itself and the closing bracket is wrapped to the next line.

This is true for both an Hebrew word in an English text and an English word in an Hebrew text.

The problem is solved if the paragraph directionality is changed to the opposite one.

See the attached document.
Comment 1 Lior Kaplan 2012-10-25 19:54:37 UTC Comment hidden (obsolete)
Comment 2 Roman Eisele 2012-10-27 18:47:59 UTC
Confirmed: REPRODUCIBLE with Lior’s sample document and LibreOffice or current master build (2012-10-27) on Mac OS X; so really a cross-platform issue.
Comment 3 Lior Kaplan 2013-03-24 23:29:57 UTC Comment hidden (obsolete)
Comment 4 Urmas 2013-03-27 23:43:08 UTC
This is caused by brackets taking the paragraph direction and staying with the text of same direction when linebreaking.
Comment 5 safa alfulaij 2014-01-23 16:47:05 UTC Comment hidden (obsolete)
Comment 6 QA Administrators 2015-09-04 02:49:10 UTC Comment hidden (obsolete)
Comment 7 Shimi Chen 2015-09-04 04:38:49 UTC Comment hidden (obsolete)
Comment 8 Robinson Tryon (qubit) 2015-12-10 03:35:02 UTC Comment hidden (obsolete)
Comment 9 Nusaiba Al Kindi 2016-01-26 08:37:34 UTC
*** Bug 65840 has been marked as a duplicate of this bug. ***
Comment 10 Nusaiba Al Kindi 2016-02-22 19:07:24 UTC
Created attachment 122889 [details]
doc before the modfication
Comment 11 Nusaiba Al Kindi 2016-02-22 19:08:01 UTC
Created attachment 122890 [details]
doc after the modfication
Comment 12 Nusaiba Al Kindi 2016-02-22 19:15:53 UTC
Hi all

I add some changes in the code to solve this issue and now line breaking cut the word(character by character) in the different script depending on the
available space in the line instead of separating the bracket from the word.

I submit a patch in gerrit and attached two images of what I did so please review and give me your comments and ideas to solve the issue.

patch link: https://gerrit.libreoffice.org/#/c/22620/

Comment 13 Omer Zak 2017-11-02 08:25:19 UTC Comment hidden (obsolete)
Comment 14 Omer Zak 2017-11-02 08:27:14 UTC
Created attachment 137441 [details]
Bug 56408 still happens in version 5.4.2

paragraph 2 (Deleted the last English word before the brackets)
paragraph 5 (Hebrew text with an English word)
paragraph 6 (Deleted the last Hebrew word before the brackets)
Comment 15 Eyal Rozenberg 2018-03-02 14:22:50 UTC Comment hidden (noise)
Comment 16 Eyal Rozenberg 2018-03-02 14:25:23 UTC
(Sorry for the messed-up comment before)
I am not 100% sure this is actually a bug, because parentheses are not strong-direction-indicating glyphs (I forget the exact Unicode term). So it may be the case that the paragraph is broken up into directional runs differently when it's LTR and when it's RTL, and that may account for the difference in behavior.

Lior, can you argue that the Unicode standard dictates behavior different than what LO does right now?
Comment 17 خالد حسني 2018-03-02 22:29:52 UTC
This is a bug, change in direction does not create line breaking opportunity.

There is even an easy way to check this with https://unicode.org/cldr/utility/breaks.jsp; copy the text there and choose Line in the drop down and it should show all possible line breaking opportunities.
Comment 18 Lior Kaplan 2018-09-30 15:00:43 UTC Comment hidden (obsolete)
Comment 19 Alex Thurgood 2018-10-19 07:23:18 UTC
*** Bug 120669 has been marked as a duplicate of this bug. ***
Comment 20 chinyuhsuan 2018-11-29 08:21:04 UTC Comment hidden (obsolete)
Comment 21 QA Administrators 2019-11-30 03:39:17 UTC Comment hidden (obsolete)
Comment 22 Eyal Rozenberg 2020-02-29 10:36:04 UTC Comment hidden (obsolete)
Comment 23 QA Administrators 2022-03-01 03:42:18 UTC Comment hidden (obsolete)
Comment 24 خالد حسني 2022-09-25 15:00:32 UTC
*** Bug 146710 has been marked as a duplicate of this bug. ***
Comment 25 Eyal Rozenberg 2022-09-25 21:38:30 UTC
(In reply to خالد حسني from comment #17)
> This is a bug, change in direction does not create line breaking opportunity.

The first part of this statement does not follow from the second. That is, the second part is true, but there already is a breaking opportunity before and after the parentheses regardless of the change in direction.

> There is even an easy way to check this

... and checking shows the line breaking opportunity.

To me it seems this is a bug because the parenthesized word would still fit on the first line of the two in the sample document. So, it's about why LO thinks it _must_ break as opposed to it mistakenly thinking it _can_ break.