Bug 56408 - Writer always breaks lines at text direction change, leaving orphan bracket
Summary: Writer always breaks lines at text direction change, leaving orphan bracket
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: high major
Assignee: Jonathan Clark
URL:
Whiteboard: target:25.2.0 target:24.8.0.2
Keywords: text:rtl
: 65840 145918 146710 160854 (view as bug list)
Depends on:
Blocks: Word-Line-Break RTL
  Show dependency treegraph
 
Reported: 2012-10-25 18:25 UTC by Lior Kaplan
Modified: 2024-10-03 21:55 UTC (History)
13 users (show)

See Also:
Crash report or crash signature:


Attachments
Mixed text test document (15.82 KB, application/vnd.oasis.opendocument.text)
2012-10-25 18:25 UTC, Lior Kaplan
Details
doc before the modfication (112.52 KB, image/png)
2016-02-22 19:07 UTC, Nusaiba Al Kindi
Details
doc after the modfication (112.38 KB, image/png)
2016-02-22 19:08 UTC, Nusaiba Al Kindi
Details
Bug 56408 still happens in version 5.4.2 (128.51 KB, image/png)
2017-11-02 08:27 UTC, Omer Zak
Details
Screenshot from Unicode utility showing line break ooprtunities (430.08 KB, image/png)
2023-06-11 09:37 UTC, ⁨خالد حسني⁩
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lior Kaplan 2012-10-25 18:25:01 UTC
Created attachment 69087 [details]
Mixed text test document

When mixing text from English (Latin) and Hebrew/Arabic (RTL languages) brackets aren't handled correctly, and the leading bracket isn't handled as part of the word. So during a word wrap the leading bracket stays in the line and the word itself and the closing bracket is wrapped to the next line.

This is true for both an Hebrew word in an English text and an English word in an Hebrew text.

The problem is solved if the paragraph directionality is changed to the opposite one.

See the attached document.
Comment 1 Lior Kaplan 2012-10-25 19:54:37 UTC Comment hidden (obsolete)
Comment 2 Roman Eisele 2012-10-27 18:47:59 UTC
Confirmed: REPRODUCIBLE with Lior’s sample document and LibreOffice 3.6.3.1 or current master build (2012-10-27) on Mac OS X; so really a cross-platform issue.
Comment 3 Lior Kaplan 2013-03-24 23:29:57 UTC Comment hidden (obsolete)
Comment 4 Urmas 2013-03-27 23:43:08 UTC
This is caused by brackets taking the paragraph direction and staying with the text of same direction when linebreaking.
Comment 5 safa alfulaij 2014-01-23 16:47:05 UTC Comment hidden (obsolete)
Comment 6 QA Administrators 2015-09-04 02:49:10 UTC Comment hidden (obsolete)
Comment 7 Shimi Chen 2015-09-04 04:38:49 UTC Comment hidden (obsolete)
Comment 8 Robinson Tryon (qubit) 2015-12-10 03:35:02 UTC Comment hidden (obsolete)
Comment 9 Nusaiba Al Kindi 2016-01-26 08:37:34 UTC
*** Bug 65840 has been marked as a duplicate of this bug. ***
Comment 10 Nusaiba Al Kindi 2016-02-22 19:07:24 UTC
Created attachment 122889 [details]
doc before the modfication
Comment 11 Nusaiba Al Kindi 2016-02-22 19:08:01 UTC
Created attachment 122890 [details]
doc after the modfication
Comment 12 Nusaiba Al Kindi 2016-02-22 19:15:53 UTC
Hi all

I add some changes in the code to solve this issue and now line breaking cut the word(character by character) in the different script depending on the
available space in the line instead of separating the bracket from the word.

I submit a patch in gerrit and attached two images of what I did so please review and give me your comments and ideas to solve the issue.

patch link: https://gerrit.libreoffice.org/#/c/22620/

Thanks
Nusaiba
Comment 13 Omer Zak 2017-11-02 08:25:19 UTC Comment hidden (obsolete)
Comment 14 Omer Zak 2017-11-02 08:27:14 UTC
Created attachment 137441 [details]
Bug 56408 still happens in version 5.4.2

See:
paragraph 2 (Deleted the last English word before the brackets)
paragraph 5 (Hebrew text with an English word)
paragraph 6 (Deleted the last Hebrew word before the brackets)
Comment 15 Eyal Rozenberg 2018-03-02 14:22:50 UTC Comment hidden (noise)
Comment 16 Eyal Rozenberg 2018-03-02 14:25:23 UTC
(Sorry for the messed-up comment before)
I am not 100% sure this is actually a bug, because parentheses are not strong-direction-indicating glyphs (I forget the exact Unicode term). So it may be the case that the paragraph is broken up into directional runs differently when it's LTR and when it's RTL, and that may account for the difference in behavior.

Lior, can you argue that the Unicode standard dictates behavior different than what LO does right now?
Comment 17 ⁨خالد حسني⁩ 2018-03-02 22:29:52 UTC
This is a bug, change in direction does not create line breaking opportunity.

There is even an easy way to check this with https://unicode.org/cldr/utility/breaks.jsp; copy the text there and choose Line in the drop down and it should show all possible line breaking opportunities.
Comment 18 Lior Kaplan 2018-09-30 15:00:43 UTC Comment hidden (obsolete)
Comment 19 Alex Thurgood 2018-10-19 07:23:18 UTC
*** Bug 120669 has been marked as a duplicate of this bug. ***
Comment 20 chinyuhsuan 2018-11-29 08:21:04 UTC Comment hidden (obsolete)
Comment 21 QA Administrators 2019-11-30 03:39:17 UTC Comment hidden (obsolete)
Comment 22 Eyal Rozenberg 2020-02-29 10:36:04 UTC Comment hidden (obsolete)
Comment 23 QA Administrators 2022-03-01 03:42:18 UTC Comment hidden (obsolete)
Comment 24 ⁨خالد حسني⁩ 2022-09-25 15:00:32 UTC
*** Bug 146710 has been marked as a duplicate of this bug. ***
Comment 25 Eyal Rozenberg 2022-09-25 21:38:30 UTC
(In reply to خالد حسني from comment #17)
> This is a bug, change in direction does not create line breaking opportunity.

The first part of this statement does not follow from the second. That is, the second part is true, but there already is a breaking opportunity before and after the parentheses regardless of the change in direction.

> There is even an easy way to check this

... and checking shows the line breaking opportunity.

To me it seems this is a bug because the parenthesized word would still fit on the first line of the two in the sample document. So, it's about why LO thinks it _must_ break as opposed to it mistakenly thinking it _can_ break.
Comment 26 ⁨خالد حسني⁩ 2023-06-11 09:37:57 UTC
Created attachment 187838 [details]
Screenshot from Unicode utility showing line break ooprtunities

(In reply to Eyal Rozenberg from comment #25)
> (In reply to خالد حسني from comment #17)
> > This is a bug, change in direction does not create line breaking opportunity.
> 
> The first part of this statement does not follow from the second. That is,
> the second part is true, but there already is a breaking opportunity before
> and after the parentheses regardless of the change in direction.

There is no line break opportunity after the opening parentheses (or before closing parentheses).
Comment 27 ⁨خالد حسني⁩ 2024-04-30 20:50:15 UTC
*** Bug 160854 has been marked as a duplicate of this bug. ***
Comment 28 Eyal Rozenberg 2024-05-10 20:24:38 UTC Comment hidden (obsolete)
Comment 29 ⁨خالد حسني⁩ 2024-05-10 20:43:21 UTC Comment hidden (obsolete)
Comment 30 Eyal Rozenberg 2024-05-10 21:06:00 UTC Comment hidden (obsolete)
Comment 31 kavandi@yahoo.com 2024-05-12 13:12:08 UTC
dear خالد It is 12 years. is any solution?
Comment 32 Jonathan Clark 2024-07-02 13:43:13 UTC
*** Bug 145918 has been marked as a duplicate of this bug. ***
Comment 33 Jonathan Clark 2024-07-02 14:06:08 UTC
The root cause for this bug is Writer failing to backtrack before bidi portions while doing line breaking.

In order to lay out and render text, Writer must segment the text into portions. This can happen for a variety of reasons. For example, if only a part of a word in English text changes, the track changes feature will split that word across two or more portions.

Having to segment text in this way creates a situation where, potentially, a portion containing a break opportunity fits on a line, followed by an arbitrary number of portions which overflow the line, none of which contain break opportunities. In order to lay out such text correctly, Writer keeps track of the portion and position of the last break opportunity. Then, if unbreakable portions cause the line to overflow, Writer can backtrack (rewind) to the previous break opportunity, insert the break, and then continue layout from that point.

This mechanism works correctly for both LTR and RTL text. However, it's currently broken for bidirectional text. Inserting text which implies a direction change produces a bidi portion (a type of multi portion). The relevant backtracking code was written deliberately so that Writer always treats the start of a multi portion as a break opportunity. I'm still unsure why; it's possible this was only intended for the other types of multi portions, and bidi portions were affected inadvertently.
Comment 34 Eyal Rozenberg 2024-07-02 22:40:38 UTC
(In reply to Jonathan Clark from comment #33)
> This mechanism works correctly for both LTR and RTL text. However, it's
> currently broken for bidirectional text. 

I take it you mean a paragraph containing both strong-LTR and strong-RTL characters?

> The
> relevant backtracking code was written deliberately so that Writer always
> treats the start of a multi portion as a break opportunity. I'm still unsure
> why

At the risk of stating the obvious... I take it you've tried git-blame'ing that code and asking the people who have worked on it?
Comment 35 Jonathan Clark 2024-07-05 21:59:17 UTC
(In reply to Eyal Rozenberg from comment #34)
> At the risk of stating the obvious... I take it you've tried git-blame'ing
> that code and asking the people who have worked on it?

Yes, I did. It looks like the bug was inherited from StarOffice. The most recent relevant work in this area was from 2001, by an author who hasn't contributed since 2002.

This far out, I think it's best to rely on what the code says. I'm not sure if the current behavior for other multi portions is correct, but it's definitely wrong for bidi portions. It's easy enough to limit the impact of a fix to bidi portions, and then extend it later if needed.
Comment 36 Commit Notification 2024-07-08 23:32:32 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/6a54d08e6e52623f9769d17d7ea7390052cb275b

tdf#56408 Writer always breaks lines at text direction change

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 37 Commit Notification 2024-07-10 20:14:10 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "libreoffice-24-8":

https://git.libreoffice.org/core/commit/e110c64b8d260435b69fe71e40fc6c6e2b9b4e07

tdf#56408 Writer always breaks lines at text direction change

It will be available in 24.8.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 38 Commit Notification 2024-07-18 22:28:41 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e9a37656b75b4ce82b3e48af727f03f386e64a08

(related tdf#56408) crashtesting: assert on exporting ooo30385-2.doc to odt

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 39 Commit Notification 2024-07-19 09:27:17 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "libreoffice-24-8":

https://git.libreoffice.org/core/commit/ebe618f7b34eaede61cfe8141b2adc2a269d3e7e

(related tdf#56408) crashtesting: assert on exporting ooo30385-2.doc to odt

It will be available in 24.8.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 40 Eyal Rozenberg 2024-10-03 21:55:49 UTC
Jonathan, what about other modules? Are we certain the problematic behavior doesn't manifest elsewhere?