Bug 141663 - Punctuation issue with nested LTR sentences in an RTL paragraph
Summary: Punctuation issue with nested LTR sentences in an RTL paragraph
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.1.2.2 release
Hardware: All Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsDevAdvice
Depends on:
Blocks: RTL-CTL
  Show dependency treegraph
 
Reported: 2021-04-13 00:40 UTC by Eric Bright
Modified: 2023-03-18 18:00 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Test sample 1 (9.48 KB, application/vnd.oasis.opendocument.text)
2021-10-15 16:48 UTC, Eric Bright
Details
Test sample 2 (13.58 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2021-10-15 16:49 UTC, Eric Bright
Details
Test sample 3 (13.71 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2021-10-15 16:49 UTC, Eric Bright
Details
A screenshot of test sample 1 (48.25 KB, image/png)
2021-10-15 17:05 UTC, Eric Bright
Details
A screenshot of test sample 3 (62.52 KB, image/png)
2021-10-15 17:08 UTC, Eric Bright
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Bright 2021-04-13 00:40:45 UTC
Description:
When writing a sentence that is left-to-right (LTR) inside a right-to-left (RTL) paragraph, any punctuation mark at the end of the LTR sentence jumps to the beginning of that LTR sentence, instead of staying at the end of the LTR sentence.

Steps to Reproduce:
1. Start a document and set it up for a right-to-left language, such as Persian
2. Switch the keyboard to Persian. Start writing a Persian paragraph. It must be set right-to-left and right-aligned
3. In the same paragraph, switch the keyboard to English. Then write an English word with an exclamation mark at the end (or any other punctuation mark)

Actual Results:
As soon as you type the punctuation mark at the end of the English sentence, it jumps to the beginning of the English sentence (which is the end of the Persian paragraph).

Expected Results:
The punctuation mark must stay at the end of the English sentence within the Persian paragraph.


Reproducible: Always


User Profile Reset: No



Additional Info:
When you switch the keyboard from Persian to English, the behaviour of the typesetter must follow the ways of that language, i.e. LTR. Then when the keyboard is switched back from English to Persian, the behaviour of the typesetter must go back to the RTL behaviour. This is the correct behaviour and MS Word correctly does that. LO Writer does not do it correctly.

If you try the above-mentioned steps inside an MS Word document, you will see the correct behaviour.

Here is a paragraph to use as an example (it is not showing as RTL on this page, and the punctuation appears in the right location. Copy and paste it into a new LO Writer document, set the paragraph to RTL and right-aligned, and see what happens):

انگلیسی باستانها، از نظر گویش، بسیار متفاوت از انگلیسی امروزی است. برای نمونه، واژه‌های light و drought، به ترتیب به صورت «لیخت» و «دراخت» تلفظ می‌شد یا جملهٔ Will you give me your jacket, good man?، چنین ادا می‌شد: Wilt ðu sellan me ðin clæð, godman min? در این دوره شماری واژه، از زبان لاتین، وارد انگلیسی شد که واژگانی چون: altar, mass, priest, psalm, temple, kitchen, palm, pear,... از آن جمله‌اند.
Comment 1 Dieter 2021-10-14 13:22:49 UTC
Eric, unfortunately nobody could confirm this bug report during the last months. So I'd like to ask, if it is still valid. Could you please try to reproduce it with the latest version of LibreOffice from https://www.libreoffice.org/download/libreoffice-fresh/ ? I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' if the bug is still present in the latest version.
Comment 2 Eric Bright 2021-10-15 16:48:56 UTC
Created attachment 175760 [details]
Test sample 1
Comment 3 Eric Bright 2021-10-15 16:49:17 UTC
Created attachment 175761 [details]
Test sample 2
Comment 4 Eric Bright 2021-10-15 16:49:42 UTC
Created attachment 175762 [details]
Test sample 3
Comment 5 Eric Bright 2021-10-15 16:50:34 UTC
Hi Dieter. I tested the issue in LO v.7.2.2.2 today and the results were the same. I created a simpler version of my test text that I attach above this comment.

Test sample 1: Made in LO 7.2.2.2 - How it looks as of now.
Test sample 2: Made with MS Word - Copy-pasted from LO 7.2.2.2 - broken even worse
Test sample 3: Made with MS Word - Typed, from scratch Word
Comment 6 Eric Bright 2021-10-15 16:51:53 UTC
Comment on attachment 175760 [details]
Test sample 1

This sample shows how the mixed text will look like if you type it inside LO 7.2.2.2. The question mark at the end of the nested English sentence is in a wrong place.
Comment 7 Eric Bright 2021-10-15 16:53:59 UTC
Comment on attachment 175761 [details]
Test sample 2

This sample shows how the mixed text will look like if you copy the typed text in test sample 1 and then paste it into an MS Word document. Now, not only the question mark at the end of the nested English sentence is in a wrong place, but also the whole English sentence is now backward.
Comment 8 Eric Bright 2021-10-15 17:01:00 UTC
Comment on attachment 175762 [details]
Test sample 3

This is how the same paragraph should have looked like in the first place. In this sample, I typed it from scratch inside MS Word 365 as one normally would.

As you can see, the nested English sentence is correctly punctuated inside the Persian sentence. That is how the nesting of languages within other languages should be treated. The Persian language is an RTL language. A LTR sentence within a paragraph that itself is RTL must be treated as an LTR section, perhaps the same way that HTML treats <div> within divs or <span> within a <div>. The whole English sentence must be enclosed within its own <div> or <span> or whatever tag that LO uses. So, it can be given the right LTR and 'left-aligned' tags or treatments, regardless of the larger <div> in which it is nested.
Comment 9 Eric Bright 2021-10-15 17:05:48 UTC
Created attachment 175763 [details]
A screenshot of test sample 1

This is a screenshot of test sample 1, in case it does not appear as it should on your screen.
Comment 10 Eric Bright 2021-10-15 17:08:43 UTC
Created attachment 175764 [details]
A screenshot of test sample 3

This is a screenshot of test sample 3, made inside MS Word, in case the document does not show properly on your screen. This is how the punctuation of the nested English sentence (LTR), ought to look inside a Persian paragraph (RTL).
Comment 11 Eyal Rozenberg 2021-11-12 19:16:09 UTC
(In reply to Eric Bright from comment #0)
> Description:
> When writing a sentence that is left-to-right (LTR) inside a right-to-left
> (RTL) paragraph, any punctuation mark at the end of the LTR sentence jumps
> to the beginning of that LTR sentence, instead of staying at the end of the
> LTR sentence.

This is intended behavior. If the paragraph is RTL, it is assumed that a punctuation mark is part of the paragraph's flow, with the LTR text so far having come to an end. A punctuation mark is direction-neutral, and there is no way to know for certain whether you wanted it to be in the LTR run or not. Once you write additional characters, LO can infer the direction with more certainty. But if you _don't_ write any more characters, then your punctuation mark (typically a period) should indeed be assumed to be RTL again - since it's reasonable for you to want to finish your RTL sentence with a punctuation mark.

You can "force" the direction of the punctuation mark, by inserting a Unicode control character, such as an RLM:

https://en.wikipedia.org/wiki/Right-to-left_mark

at the link, you'll notice how it is used for doing just this.


Everything I've said applies similarly to RTL text within LTR paragraphs (with RLM marks).
Comment 12 Eric Bright 2021-11-12 20:10:50 UTC
(In reply to Eyal Rozenberg from comment #11)
> (In reply to Eric Bright from comment #0)
> > Description:
> > When writing a sentence that is left-to-right (LTR) inside a right-to-left
> > (RTL) paragraph, any punctuation mark at the end of the LTR sentence jumps
> > to the beginning of that LTR sentence, instead of staying at the end of the
> > LTR sentence.
> 
> This is intended behavior. If the paragraph is RTL, it is assumed that a
> punctuation mark is part of the paragraph's flow, with the LTR text so far
> having come to an end. A punctuation mark is direction-neutral, and there is
> no way to know for certain whether you wanted it to be in the LTR run or
> not. Once you write additional characters, LO can infer the direction with
> more certainty. But if you _don't_ write any more characters, then your
> punctuation mark (typically a period) should indeed be assumed to be RTL
> again - since it's reasonable for you to want to finish your RTL sentence
> with a punctuation mark.
> 
> You can "force" the direction of the punctuation mark, by inserting a
> Unicode control character, such as an RLM:
> 
> https://en.wikipedia.org/wiki/Right-to-left_mark
> 
> at the link, you'll notice how it is used for doing just this.
> 
> 
> Everything I've said applies similarly to RTL text within LTR paragraphs
> (with RLM marks).

Thank you for the reply. Even if that behaviour is the default, which seems to be the case, that default behaviour is incorrect and must be corrected. As you can clearly see in the attached images/documents, MS Word does it correctly. An English sentence (LTR within a Persian sentence (RTL) must still retain its correct arrangement. MS Word keeps the correct arrangement. LO Writer does not.

Fixing this incorrect behaviour should not be that difficult since MS Word has already figured it out and it is already known how that would work in HTML with proper tags. LO Writer must, at least, produce a document as correctly as a simple HTML page would do.

As such, I am changing the status of the bug to 'unconfirmed' since you saw and verified that behaviour. This is a bug that is both annoying and unreasonable and must be looked into. The simple question is how MS Word and all browsers can correctly do what I just described but LO Writer believes it to be natural to scramble everything and think of it as normal. As I showed in the attached documents, this behaviour is not a feature; it is a bug.

If one believes this bug is the intended behaviour and no one wants to fix it, then please change the status to " VERIFIES" and "WONTFIX."
Comment 13 Eyal Rozenberg 2021-11-13 16:36:20 UTC
(In reply to Eric Bright from comment #12)
> Thank you for the reply. Even if that behaviour is the default, which seems
> to be the case, that default behaviour is incorrect and must be corrected.
> As you can clearly see in the attached images/documents, MS Word does it
> correctly. An English sentence (LTR within a Persian sentence (RTL) must
> still retain its correct arrangement. MS Word keeps the correct arrangement.
> LO Writer does not.

Hmm. I'm not an LO dev but, I wonder... perhaps it's the case that Microsoft Word keeps an association of the typed characters to a keyboard layout or a language based on the keyboard layout, which sticks after you've moved on to type something else; while in LO, no such extra direction hinting information is maintained.
Comment 14 Dieter 2023-02-10 06:53:13 UTC
(In reply to Eyal Rozenberg from comment #13)
> Hmm. I'm not an LO dev but, I wonder... perhaps it's the case that Microsoft
> Word keeps an association of the typed characters to a keyboard layout or a
> language based on the keyboard layout, which sticks after you've moved on to
> type something else; while in LO, no such extra direction hinting
> information is maintained.

Eyal, is is correct, that you agree with Eric, that it is a bug, but developer advice is needed to figure out, why it happens?
In this case I think we can change status to NEW. Do you agree?
Comment 15 Dieter 2023-02-10 06:54:15 UTC
(In reply to Dieter from comment #14)
> Eyal, is is correct, that you agree with Eric, that it is a bug, but
> developer advice is needed to figure out, why it happens?
> In this case I think we can change status to NEW. Do you agree?

cc: Eyal Rozenberg
Comment 16 Eyal Rozenberg 2023-02-10 13:14:41 UTC
(In reply to Eric Bright from comment #0)
> Actual Results:
> As soon as you type the punctuation mark at the end of the English sentence,
> it jumps to the beginning of the English sentence (which is the end of the
> Persian paragraph).
> 
> Expected Results:
> The punctuation mark must stay at the end of the English sentence within the
> Persian paragraph.

No, that's not expected. Unless you somehow indicated the exclamation mark belongs to something LTRish  with the RTL paragraph - it belong at the end of the paragraph. 

You can, if you like, insert an LRM mark (https://en.wikipedia.org/wiki/Left-to-right_mark) after the exclamation mark; it will then be laid out as part of an LTR sequence - between an English character and another strongly-LTR character, the LRM.

Now, you _could_ argue that the position of the cursor, and the current keyboard layout language (English) suggest that more strong-LTR characters will be added, and the layout should be made as though a "phantom LRM" were present at that position. But this is certainly not a bug, and such a suggestions has significant drawbacks.


See also: 
https://www.unicode.org/reports/tr9/
and specifically the section about resolving embedding levels. It's a long and complicated document though.
Comment 17 Eric Bright 2023-03-02 18:33:41 UTC
Khaled, do you mind if you kindly look into this issue whenever possible, please? I changed the assignee to you Khaled, but feel free to change it back to default if needed (since I don’t know the proper procedure for such assignment).
Comment 18 ⁨خالد حسني⁩ 2023-03-18 17:43:29 UTC
There is no bug here, this is the standard Unicode Bidirectional Text algorithm behavior. If you copy the text to any other application (other than MS Word) you will get the same behavior. MS Word does something non-standard here, but I don’t know if the specifics are documented anywhere.
Comment 19 Eyal Rozenberg 2023-03-18 18:00:53 UTC
Like I said, if you wish to argue against the common, intentional behavior - you can make that argument; but probably a separate bug would be in order, discussing only the general question. You will need to provide motivation, refer to the relevant parts of the UBA, and explain why the benefits outweight the detriments of breaking the custom.