Bug 162305 - When opening plain text files with LTR and RTL, paragraph directions and alignment are wrong
Summary: When opening plain text files with LTR and RTL, paragraph directions and alig...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
24.2.0.0 alpha1+
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on: 157037 162306
Blocks: File-Opening RTL
  Show dependency treegraph
 
Reported: 2024-08-01 22:41 UTC by Eyal Rozenberg
Modified: 2024-08-03 09:15 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Text file with two lines English, one line space, two lines Hebrew (117 bytes, text/plain)
2024-08-01 22:42 UTC, Eyal Rozenberg
Details
Rendering of text file in gedit vs LO Writer 25.2 (115.91 KB, image/png)
2024-08-01 22:43 UTC, Eyal Rozenberg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2024-08-01 22:41:38 UTC
Suppose I open a text file with the following 5 lines of text:

--------------
The quick brown fox
jumped over the lazy dog.

"אל תדאג", אמר הדג,
"יש לי עורך-דין כריש!"
--------------

This file should will have either 2 or 5 paragraphs (depending on whether a paragraph is considered to be broken at line end or at an empty line); LO goes for paragraph-per-line, let's go with that.

Now, the paragraphs should have the following directions and alignments:

Index  |  Direction | Alignment
-------------------------------
  0    | LTR        | Left
  1    | LTR        | Left
  2    | LTR/RTL    | N/A  <- empty line, app's choice
  3    | RTL        | Right
  4    | RTL        | Right

And indeed, that's what happens when you open the file in a text editor like MousePad or gedit.

With LO , however, we get a result that's not even inconsistent, but depends on the locale (!)

With en_IL: All lines are LTR and left-aligned

With he_IL: All lines are RTL and left-aligned


That's just wrong:

* Paragraph direction should not depend on locale, except if the paragraph only has direction-neutral character (and even then, preceding and succeeding locale consensus should be preferred, and otherwise it's still probably better to take the previous paragraph's direction)

* Alignment should follow direction - RTL -> Right, LTR -> Left

* Direction should be set according to paragraph contents (see bug 162120)
Comment 1 Eyal Rozenberg 2024-08-01 22:42:20 UTC
Created attachment 195653 [details]
Text file with two lines English, one line space, two lines Hebrew
Comment 2 Eyal Rozenberg 2024-08-01 22:43:50 UTC
Created attachment 195654 [details]
Rendering of text file in gedit vs LO Writer 25.2

Opened with LO build id:

Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: dc9486f2443fa52588b625c0a2a288bff56a7a45
CPU threads: 4; OS: Linux 6.6; UI render: default; VCL: gtk3
Locale: en-IL (en_IL); UI: en-US
Comment 3 Eyal Rozenberg 2024-08-01 22:51:14 UTC Comment hidden (obsolete)
Comment 4 Heiko Tietze 2024-08-02 07:35:50 UTC
(In reply to Eyal Rozenberg from comment #0)
> With LO , however, we get a result that's not even inconsistent, but depends
> on the locale (!)
Confirming, with 

Version: 24.2.5.2 (X86_64) / LibreOffice Community
Build ID: 420(Build:2)
CPU threads: 32; OS: Linux 6.10; UI render: default; VCL: kf6 (cairo+xcb)
Locale: de-DE (en_US.UTF-8); UI: en-US
24.2.5-1
Calc: threaded

The Hebrew part is recognized as CTL, the English part as Western (as German in my case), and both are assigned to the language defined under tools > options > languages. Related to bug 95274.

But CTL has no consequences on the paragraph, which should be the focus here. Or rather on any of the many duplicates.

No topic for UX though.
Comment 5 Eyal Rozenberg 2024-08-02 08:44:54 UTC
(In reply to Heiko Tietze from comment #4)
> No topic for UX though.

The UX questions are:

1. Should the direction, or the alignment, of a text file be determined based only on its content, or also on the app's locale? What is it reasonable for people to expect?

2. Do we agree that directions should be set at the paragraph level, rather than at the file level?
Comment 6 Heiko Tietze 2024-08-02 09:10:56 UTC
(In reply to Eyal Rozenberg from comment #5)
> 1. Should the direction, or the alignment, of a text file be determined
> based only on its content, or also on the app's locale? What is it
> reasonable for people to expect?
However it is detected you expect RTL for CTL content, and LTR vice versa. It must not follow the page properties, the system locale, the configuration.

> 2. Do we agree that directions should be set at the paragraph level, rather
> than at the file level?
I see no reason to tinker with the page style (guess that is what you mean with file aka document). It follows the default and a typical situation when people in Egypt copy/paste English content in their document, to make an example from the other side, should work out of the box.
Comment 7 Eyal Rozenberg 2024-08-02 14:17:03 UTC
(In reply to Heiko Tietze from comment #6)
> (In reply to Eyal Rozenberg from comment #5)
> > 1. Should the direction, or the alignment, of a text file be determined
> > based only on its content, or also on the app's locale? What is it
> > reasonable for people to expect?
> However it is detected you expect RTL for CTL content, and LTR vice versa.

1. Ok, if it's universally obvious than great :-)

2. RTL, not CTL (which is not relevant in this context).


> It must not follow the page properties, the system locale, the configuration.

Ah, but what about a text file which only contains neutral characters?

e.g. a file with numbers, dots and commas?
 
> > 2. Do we agree that directions should be set at the paragraph level, rather
> > than at the file level?
> I see no reason to tinker with the page style (guess that is what you mean
> with file aka document). It follows the default and a typical situation when
> people in Egypt copy/paste English content in their document, to make an
> example from the other side, should work out of the box.

While this is my bottom-line opinion, it is actually not entirely that obvious. Let me play devil's advocate for a second.

Suppose I paste a file with 100 lines in Hebrew, but one line of which, somewhere in the Middle, has mostly German and some Hebrew.

Does that paragraph have "German content"? Maybe it's the whole Document with Hebrew content that just happens to have a few words, or lines, mostly in German?


Now, what if you have a line with 50% of the words in Hebrew and 50% of the words in German. Is it Hebrew content? German content? Both? None? What would the direction be?

And now what if you have a single line which is all Hebrew, except that the first word is German. What should the direction be now?


----

Now for another kind of question. I open my LibreOffice Writer, and start typing some text in German. Then I press Enter, switch keyboard layouts, and type some text in Hebrew.

And then I do the exact same thing but in a text editor; then I save my file; then open it in LO Writer; then navigate to the end of the file.

By your argument, the layouts in the two cases should be different - because in one case we got a direction detection, while in the other we had not - even though we had typed the exact same thing. is that Reasonable? Again, not obvious.