Bug 146572 - Last space on line ending with RTL text after LTR text is placed between them rather than at end-of-line
Summary: Last space on line ending with RTL text after LTR text is placed between them...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: RTL
  Show dependency treegraph
 
Reported: 2022-01-04 20:28 UTC by jcuenod
Modified: 2024-08-17 10:05 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Incorrectly positioned wrapped Hebrew text (12.67 KB, image/png)
2022-01-04 20:28 UTC, jcuenod
Details
This gif demonstrates the bidi layout bug (28.53 KB, image/gif)
2022-01-27 15:33 UTC, jcuenod
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jcuenod 2022-01-04 20:28:48 UTC
Created attachment 177315 [details]
Incorrectly positioned wrapped Hebrew text

Description:
--
When RTL text wraps across lines within an LTR paragraph, the first line of RTL text is not left aligned as expected. This issue is very noticeable when the RTL text follows an opening bracket but it seems to exist irrespective of the text that precedes the RTL string.


System:
--
Linux 5.15.8-arch1-1
LibreOffice 7.2.4.1
Fonts: Times New Roman and SBL Biblit


To Reproduce:
--
1. Create a new document
2. Start with Lorem Ipsum and add RTL text close to the end of the line so that it must wrap. For me, this was sufficient:

> This is just a test document. This is just a test document. This is just a test document. RTL: (לא־תהיה אחרי־רבים לרעת)

I would expect the leftmost portion of the Hebrew text to be adjacent to the opening bracket.
Comment 1 jcuenod 2022-01-11 15:51:23 UTC
Is more information needed to confirm this bug?
Comment 2 Dieter 2022-01-25 17:15:29 UTC
(In reply to jcuenod from comment #0)
> I would expect the leftmost portion of the Hebrew text to be adjacent to the
> opening bracket.

I wouldn't expect this. Could you please explain? I think, the problem is described in bug 146572. What problem remains, if bug 146572 is fixed?
=> NEEDINFO
Comment 3 jcuenod 2022-01-25 23:01:05 UTC
I believe that this bug is distinct from bug 146710 (which is the reference I think you intended). 146710 is about the attachment of neutral characters in the wrapping algorithm. This bug can be produced without neutral characters.

In this bug, I expect a wrapped RTL string in an LTR paragraph to be left aligned. In the attachment, we see the actual output: white-space between the opening bracket and the leftmost Hebrew character.
Comment 4 Eyal Rozenberg 2022-01-26 23:03:34 UTC
> the first line of RTL text is not left aligned as expected

Not sure what you mean exactly, going by this sentence:

> I would expect the leftmost portion of the Hebrew text to be adjacent to the opening bracket.

Why? Why is it "better" to have the space on the next line, or nowhere, rather than on the first line? It serves to indicated that the parenthesis doesn't come right after the תהיה. 

But even ignoring the intuition above - what's the formal basis (here: http://www.unicode.org/reports/tr9/tr9-23.html I would think) for your expectation?
Comment 5 jcuenod 2022-01-26 23:31:14 UTC
I think my last reply may add clarification on what I expect. I think you interpreted me correctly. But just to make another attempt at clarification in case it's needed:

I would expect that, when a string of RTL text wraps across a line of left aligned text (because the rest of the paragraph is LTR), the span of RTL text that remains on the first line would left align with the rest of the paragraph. I think that the same intuition is why I expect to see the span of RTL text that continues on the second line to be on the *left* of that line with LTR text continuing on the right.

If the unicode spec says this is not how it should be, I apologise.

Anecdotally, however, this is how I've seen other editors lay out such text (e.g. Google docs) and this is how I've observed it in journal articles. If you need a list of examples, I will try to dig some up. I don't have a formal basis for this intuition but I'm not convinced that the unicode docs you are linking specify preserving whitespace in two directions at the end of a wrapped line either (which, I assume, is what is producing the [imo] buggy output).
Comment 6 jcuenod 2022-01-27 15:33:16 UTC
Created attachment 177842 [details]
This gif demonstrates the bidi layout bug
Comment 7 jcuenod 2022-02-02 19:11:13 UTC
After displaying "Formatting Marks" (under View), I can confirm that the problem is that whitespace at the end of the line seems to be conditionally displayed. In a line of BIDI text, both the LTR and the RTL text display their whitespace which produces an unexpected amount of space between the two spans of text.

This can be reproduced by:
1. Display formatting marks
2. Type half a line of LTR text followed by a space and enough RTL text till it wraps to the next line (must contain multiple words).
3. Add LTR characters at the beginning of the first line until another RTL word wraps.

Two "space" formatting marks will appear between the LTR and RTL text.

There should only be one.
Comment 8 Eyal Rozenberg 2022-02-07 21:10:50 UTC
(In reply to jcuenod from comment #5)

Actually, that doesn't make things clearer. Paragraph alignment is basically orthogonal paragraph direction. And the span of RTL text does not "align" anywhere; the only questions are:

1. Should the space character(s) between the last word fitting on the first line and the first word on the next line should be kept on the first line, assuming they fit, or moved to the next one?
2. If they are kept on the first line, should they appear on the left side, between the LTR span and the last letter of the last RTL word, or on the right side, to the right of the first RTL word?

LO Writer answers this with "Kept on the first line" and "on the left side".

I am not sure what Unicode standard says! 

It won't hurt to check using MS-Word and MS Notepad what Microsoft does.

Using the XFCE mousepad editor, I notice the answers are "kept on the first line" and "on the right side" - unlike the LO logic, and perhaps more in line with what you expect. So does GNOME's gedit. AbiWord acts like LO Writer though.

> Anecdotally, however, this is how I've seen other editors lay out such text
> (e.g. Google docs) and this is how I've observed it in journal articles.

Can you attach some screenshots?

> If you need a list of examples, I will try to dig some up. 

Windows examples would be pertinent. If you can get them, please do.
Comment 9 Buovjaga 2022-12-08 12:46:07 UTC
NEEDINFO per last comment
Comment 10 ⁨خالد حسني⁩ 2022-12-08 15:33:59 UTC
This is a bug, the trailing space at line break should be at the end of the line even for embedded RTL text.
Comment 11 Eyal Rozenberg 2024-08-17 10:05:25 UTC
Rephrasing title as per Khaled's last comment.

We should probably create a test document here.