Bug 162120 - Auto-detect paragraph directions when they were not set explicitly
Summary: Auto-detect paragraph directions when they were not set explicitly
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsDevEval
: 155078 (view as bug list)
Depends on: 157037
Blocks: Language-Detection RTL
  Show dependency treegraph
 
Reported: 2024-07-21 02:49 UTC by AvidSeeker
Modified: 2024-08-03 09:15 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Libreoffice does not automatically set paragraph text direction. (108.13 KB, image/png)
2024-07-21 02:50 UTC, AvidSeeker
Details
GTK text editor correctly implement UBA (84.39 KB, image/png)
2024-07-21 02:50 UTC, AvidSeeker
Details
RTL mark and LTR marks usage in a plaintext file (116 bytes, text/plain)
2024-07-21 18:26 UTC, AvidSeeker
Details

Note You need to log in before you can comment on or make changes to this bug.
Description AvidSeeker 2024-07-21 02:49:32 UTC
Description:
LO allows setting paragraph text direction (RTL or LTR), however, this has to be done manually for every paragraph.

Unicode specifies Unicode Bidirectional Algorithm (UBA) for programs to automatically detect the directionality of each paragraph.

GTK implements UBA. If the following sample text is pasted in a GTK text editor, it will automatically align text accordingly.

Attached is a screenshot for Libreoffice (aligns all to the left) and XFCE text editor (aligns according to UBA).

SAMPLE TEXT:

قِفَا نَبْكِ مِنْ ذِكْرَى حَبِيبٍ ومَنْزِلِ، بِسِقْطِ اللِّوَى بَيْنَ الدَّخُول فَحَوْمَلِ. فَتُوْضِحَ فَالمِقْراةِ لمْ يَعْفُ
رَسْمُها، لِمَا نَسَجَتْهَا مِنْ جَنُوبٍ وشَمْألِ. تَرَى بَعَرَ الأرْآمِ فِي عَرَصَاتِهَا، وَقِيْعَانِهَا كَأنَّهُ حَبُّ
فُلْفُلِ. كَأنِّي غَدَاةَ البَيْنِ يَوْمَ تَحَمَّلُوا، لَدَى سَمُرَاتِ الحَيِّ نَاقِفُ حَنْظَلِ. وُقُوْفًا بِهَا صَحْبِي عَليَّ
مَطِيَّهُمُ، يَقُوْلُوْنَ: لا تَهْلِكْ أَسًى وَتَجَمَّلِ. وإِنَّ شِفائِيَ عَبْرَةٌ مُهْرَاقَةٌ، فَهَلْ عِنْدَ رَسْمٍ دَارِسٍ مِنْ
مُعَوَّلِ؟. كَدَأْبِكَ مِنْ أُمِّ الحُوَيْرِثِ قَبْلَهَا، وَ­جَارَتِهَا أُمِّ الرَّبَابِ بِمَأْسَلِ. إِذَا قَامَتَا تَضَوَّعَ المِسْكُ
مِنْهُمَا، نَسِيْمَ الصَّبَا جَاءَتْ بِرَيَّا القَرَنْفُلِ. فَفَاضَتْ دُمُوْعُ العَيْنِ مِنِّي صَبَابَةً، عَلَى النَّحْرِ حَتَّى
بَلَّ دَمْعِيَ مِحْمَلِي.

Lorem ipsum dolor sit amet, co­nsetetur sadipscing elitr, sed diam nonumy eirmod
tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At
vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren,
no sea takimata sanctus est Lorem ipsum dolor sit amet.

Steps to Reproduce:
.

Actual Results:
.

Expected Results:
.


Reproducible: Always


User Profile Reset: No

Additional Info:
.
Comment 1 AvidSeeker 2024-07-21 02:50:13 UTC
Created attachment 195410 [details]
Libreoffice does not automatically set paragraph text direction.
Comment 2 AvidSeeker 2024-07-21 02:50:36 UTC
Created attachment 195411 [details]
GTK text editor correctly implement UBA
Comment 3 AvidSeeker 2024-07-21 03:45:25 UTC
Related bug that resets paragraph direction when style is changed:

https://bugs.documentfoundation.org/show_bug.cgi?id=151857
Comment 4 AvidSeeker 2024-07-21 03:49:08 UTC
UBA: https://www.unicode.org/reports/tr9/
Comment 5 V Stuart Foote 2024-07-21 10:07:41 UTC
Unicode BIDI handling provided by ICU lib are already implemented but depend on the RTL/LTR direction assignment for a text run or full paragraph. Assignments are currently done in UI as direct formatting applied from UNO button actions on the Standard Toolbar. And automated against text runs of Unicode with "Strong" bidi values. Some recent adjustments [1][2][3]

So the issue is language detection of text runs, words, sentences, and paragraph blocks. And automating PS assignment as RTL/LTR is similar to language detection/assignment needs for Spell checking, i.e. bug 91766. Additionally language could be assigned based on IME or keyboard locale detected. But we have issues with those approaches, bug 113298.

Once language detection improves, our "Complex" 'CTL' classification and RTL/LTR formatting can follow with it, and we can assign/format direction.

=-ref-=
[1] https://gerrit.libreoffice.org/c/core/+/36704
[2] https://gerrit.libreoffice.org/c/core/+/37050
[3] https://gerrit.libreoffice.org/c/core/+/51118
Comment 6 Eyal Rozenberg 2024-07-21 18:08:03 UTC
(In reply to V Stuart Foote from comment #5)
> Unicode BIDI handling provided by ICU lib are already implemented but depend
> on the RTL/LTR direction assignment for a text run or full paragraph.

But it is true that several apps and GUI toolkits implement some logic for auto-detecting the direction of paragraphs, when it was not set explicitly.

So, unless I misunderstand the OP, I believe the ask here is for applying such a logic.

In LO, we tend to assume someone has set the paragraph direction apriori. And, effectively, someone has. But there can be possible exceptions to this, such as:

* When opening a document for which paragraph directions are not explicitly set (e.g. plain text)
* When pasting text content without direction specification (e.g. CSV)
* Making the direction, as well as the alignment, in Calc cells be auto-detected/determined by default
* Supporting a resetting of paragraph directions of selected text by applying auto-detection logic.
* Supporting an "auto-detect" paragraph direction generally (e.g. in Writer), in addition to Left, Right, Inherit, Start and End.

I would say each of these merits its own bug, but let's hear what Op says first...
Comment 7 AvidSeeker 2024-07-21 18:24:06 UTC
Yes. The mentioned issues are relevant, but the focus of my bug is as Eyal says: auto-detecting the direction of paragraphs when it was not set explicitly.

Note that plaintext does not necessarily mean that direction is not explicitly set. UBA defines right-to-left mark (U+200F) and left-to-right mark (U+200E), which could be set in a plaintext document.
Comment 8 AvidSeeker 2024-07-21 18:26:31 UTC
Created attachment 195435 [details]
RTL mark and LTR marks usage in a plaintext file

Opening this in GTK editor will show directions overridden using RTL, LTR marks. When opening this in Vim it will show as:

<200f>this should be LTR, but it starts with UTF RTL mark, so it is RTL
<200e>هذا عربي لكنه على اليسار
Comment 9 V Stuart Foote 2024-07-21 21:17:47 UTC
From looking through our existing bidi handling, seems like we could benefit from refactoring to more completely implement the UBA of Unicode UAX#9 [1] and the ICU libs API 'ubidi' [2] implementation directly. 

Perhaps even taking this opportunity to refactor and drop our legacy Western/CTL/CJK script--locale model, which currently drives our BiDi support, and its awful UI as has been suggested in bug 104318.

=-ref-=
[1] https://www.unicode.org/reports/tr9/
[2] https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ubidi_8h.html#details
Comment 10 Heiko Tietze 2024-07-22 08:39:17 UTC
From UX POV it's desirable if not necessary to apply the correct text direction and alignment.
Comment 11 Eyal Rozenberg 2024-08-02 22:46:03 UTC
I think this might be marked a duplicate of 155078 - about the same subject, from last year. We could com. Thoughts?
Comment 12 V Stuart Foote 2024-08-02 22:58:58 UTC
*** Bug 155078 has been marked as a duplicate of this bug. ***