Description: LO allows setting paragraph text direction (RTL or LTR), however, this has to be done manually for every paragraph. Unicode specifies Unicode Bidirectional Algorithm (UBA) for programs to automatically detect the directionality of each paragraph. GTK implements UBA. If the following sample text is pasted in a GTK text editor, it will automatically align text accordingly. Attached is a screenshot for Libreoffice (aligns all to the left) and XFCE text editor (aligns according to UBA). SAMPLE TEXT: قِفَا نَبْكِ مِنْ ذِكْرَى حَبِيبٍ ومَنْزِلِ، بِسِقْطِ اللِّوَى بَيْنَ الدَّخُول فَحَوْمَلِ. فَتُوْضِحَ فَالمِقْراةِ لمْ يَعْفُ رَسْمُها، لِمَا نَسَجَتْهَا مِنْ جَنُوبٍ وشَمْألِ. تَرَى بَعَرَ الأرْآمِ فِي عَرَصَاتِهَا، وَقِيْعَانِهَا كَأنَّهُ حَبُّ فُلْفُلِ. كَأنِّي غَدَاةَ البَيْنِ يَوْمَ تَحَمَّلُوا، لَدَى سَمُرَاتِ الحَيِّ نَاقِفُ حَنْظَلِ. وُقُوْفًا بِهَا صَحْبِي عَليَّ مَطِيَّهُمُ، يَقُوْلُوْنَ: لا تَهْلِكْ أَسًى وَتَجَمَّلِ. وإِنَّ شِفائِيَ عَبْرَةٌ مُهْرَاقَةٌ، فَهَلْ عِنْدَ رَسْمٍ دَارِسٍ مِنْ مُعَوَّلِ؟. كَدَأْبِكَ مِنْ أُمِّ الحُوَيْرِثِ قَبْلَهَا، وَجَارَتِهَا أُمِّ الرَّبَابِ بِمَأْسَلِ. إِذَا قَامَتَا تَضَوَّعَ المِسْكُ مِنْهُمَا، نَسِيْمَ الصَّبَا جَاءَتْ بِرَيَّا القَرَنْفُلِ. فَفَاضَتْ دُمُوْعُ العَيْنِ مِنِّي صَبَابَةً، عَلَى النَّحْرِ حَتَّى بَلَّ دَمْعِيَ مِحْمَلِي. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Steps to Reproduce: . Actual Results: . Expected Results: . Reproducible: Always User Profile Reset: No Additional Info: .
Created attachment 195410 [details] Libreoffice does not automatically set paragraph text direction.
Created attachment 195411 [details] GTK text editor correctly implement UBA
Related bug that resets paragraph direction when style is changed: https://bugs.documentfoundation.org/show_bug.cgi?id=151857
UBA: https://www.unicode.org/reports/tr9/
Unicode BIDI handling provided by ICU lib are already implemented but depend on the RTL/LTR direction assignment for a text run or full paragraph. Assignments are currently done in UI as direct formatting applied from UNO button actions on the Standard Toolbar. And automated against text runs of Unicode with "Strong" bidi values. Some recent adjustments [1][2][3] So the issue is language detection of text runs, words, sentences, and paragraph blocks. And automating PS assignment as RTL/LTR is similar to language detection/assignment needs for Spell checking, i.e. bug 91766. Additionally language could be assigned based on IME or keyboard locale detected. But we have issues with those approaches, bug 113298. Once language detection improves, our "Complex" 'CTL' classification and RTL/LTR formatting can follow with it, and we can assign/format direction. =-ref-= [1] https://gerrit.libreoffice.org/c/core/+/36704 [2] https://gerrit.libreoffice.org/c/core/+/37050 [3] https://gerrit.libreoffice.org/c/core/+/51118
(In reply to V Stuart Foote from comment #5) > Unicode BIDI handling provided by ICU lib are already implemented but depend > on the RTL/LTR direction assignment for a text run or full paragraph. But it is true that several apps and GUI toolkits implement some logic for auto-detecting the direction of paragraphs, when it was not set explicitly. So, unless I misunderstand the OP, I believe the ask here is for applying such a logic. In LO, we tend to assume someone has set the paragraph direction apriori. And, effectively, someone has. But there can be possible exceptions to this, such as: * When opening a document for which paragraph directions are not explicitly set (e.g. plain text) * When pasting text content without direction specification (e.g. CSV) * Making the direction, as well as the alignment, in Calc cells be auto-detected/determined by default * Supporting a resetting of paragraph directions of selected text by applying auto-detection logic. * Supporting an "auto-detect" paragraph direction generally (e.g. in Writer), in addition to Left, Right, Inherit, Start and End. I would say each of these merits its own bug, but let's hear what Op says first...
Yes. The mentioned issues are relevant, but the focus of my bug is as Eyal says: auto-detecting the direction of paragraphs when it was not set explicitly. Note that plaintext does not necessarily mean that direction is not explicitly set. UBA defines right-to-left mark (U+200F) and left-to-right mark (U+200E), which could be set in a plaintext document.
Created attachment 195435 [details] RTL mark and LTR marks usage in a plaintext file Opening this in GTK editor will show directions overridden using RTL, LTR marks. When opening this in Vim it will show as: <200f>this should be LTR, but it starts with UTF RTL mark, so it is RTL <200e>هذا عربي لكنه على اليسار
From looking through our existing bidi handling, seems like we could benefit from refactoring to more completely implement the UBA of Unicode UAX#9 [1] and the ICU libs API 'ubidi' [2] implementation directly. Perhaps even taking this opportunity to refactor and drop our legacy Western/CTL/CJK script--locale model, which currently drives our BiDi support, and its awful UI as has been suggested in bug 104318. =-ref-= [1] https://www.unicode.org/reports/tr9/ [2] https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ubidi_8h.html#details
From UX POV it's desirable if not necessary to apply the correct text direction and alignment.
I think this might be marked a duplicate of 155078 - about the same subject, from last year. We could com. Thoughts?
*** Bug 155078 has been marked as a duplicate of this bug. ***