Bug 121182 - Western acronyms not correctly formatted in RTL context
Summary: Western acronyms not correctly formatted in RTL context
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.0.6.2 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: RTL-CTL Formatting-Mark
  Show dependency treegraph
 
Reported: 2018-11-05 20:02 UTC by ajlittoz
Modified: 2018-12-18 23:03 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample file showing the issue (2.89 MB, application/vnd.oasis.opendocument.text)
2018-11-05 20:05 UTC, ajlittoz
Details
The sample file with bidi control characters (2.89 MB, application/vnd.oasis.opendocument.text)
2018-11-06 04:40 UTC, ⁨خالد حسني⁩
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ajlittoz 2018-11-05 20:02:24 UTC
Description:
An LTR acronym like C++ or C# (i.e. a combination of Latin letters with symbols at some end) is inserted in an RTL language paragraph (Persian, Arabic, Hebrew). To make sure the acronym is considered as a whole, it is marked with a character style with "language" set to "None" or "English".

Despite this mark up, the symbol keeps its "context-sensitive" directionality property. Since the document language is Persian, the ending symbol (+ or #) being adjacent to Persian text (spaces do not matter) reverts to RTL directionality and the acronym is laid out as ++C or #C.

It looks like CTL styles which contain two parts, Western Font and CTL Font, select the part to be applied from some criterion intrinsic to the character, not from some user-defined attribute. There no way to force the "language" for a sequence of characters. This is important for punctuation and symbols which seem to be auto-adaptative, directionality-wise.

This may be the intended behaviour (in which case it is not a bug) but I found no indication about it. In common cases, this is the correct behaviour because symbols like + / may occur in text and should be laid out as a monotonous sequence (no direction change).

However, in technical papers, there is a need to quote weird mixed (latin+symbols) sequence and to force directionality on it.

This report originates in an AskLO question at https://ask.libreoffice.org/en/question/171132/ltr-words-inside-an-rtl-sentence/

The only ugly workaround I found is to enclose the problematic sequence with Latin letters like aC++a (in fact C++a works also here because C is already a Latin letter) and to format the a's as hidden.

Steps to Reproduce:
1. Type some RTL text (e.g.Arabic)
2. Insert somewhere " C++ "
3. Optionally, force language to None (ineffective since no spellchecker is involved)

Actual Results:
Acronym C++ in the middle of an RTL sentence is formatted as ++C

Expected Results:
Acronym C++ in the middle of an RTL text should be displayed as C++ provided there is way to inform LO Writer that this is an LTR sequence as a whole


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 ajlittoz 2018-11-05 20:05:39 UTC
Created attachment 146325 [details]
Sample file showing the issue

One-line document with a Persian sentence interspersed with C++ and C# acronyms

No matter how Defaut Style paragraph style and Technical character style are defined, acronyms are laid out as ++C and #C
Comment 2 ⁨خالد حسني⁩ 2018-11-06 04:40:31 UTC
Created attachment 146332 [details]
The sample file with bidi control characters

This is the intended bidirectional text layout, characters with neutral directionality follow the direction of the enclosing text (RTL here since the overall text is RTL). You can change this by using Unicode direction control characters. Enclosing the abbreviation between U+202A LEFT-TO-RIGHT EMBEDDING and U+202C POP DIRECTIONAL FORMATTING fixes the rendering. I don’t know if there is a way to automatically insert these control characters around some words.
Comment 3 Ali Baghernejad 2018-11-06 08:26:43 UTC
@ajlittoz, Khaled Hosny

Thanks, guys.
I don't know this is a bug or not. cuz the status of this report is UNCONFIRMED still.
If it is a bug, so:
this is a bug like any others bugs!

But if it is not and LibreOffice does not support the requirement:
this is a required feature. we expect LibreOffice to do this automatically. why? because the end user doesn't care about technical reasons that are related to. A user wants to work without thinking about that. 
I'm a technical guy and understand this. but from a non-technical user, it is not acceptable.

As an Author, currently working on my new book. and I have selected the LibreOffice word and I faced with many challenges during these days.
don't understand his one.
Comment 4 ⁨خالد حسني⁩ 2018-11-06 13:40:31 UTC
(In reply to Ali Baghernejad from comment #3)
> @ajlittoz, Khaled Hosny
> 
> Thanks, guys.
> I don't know this is a bug or not. cuz the status of this report is
> UNCONFIRMED still.
> If it is a bug, so:
> this is a bug like any others bugs!

It isn’t a bug. You get the same behavior in any other application. The direction of the characters in not determined by the language, but by fixed Unicode character properties and the Unicode Bi-direction Text Algorithm (http://unicode.org/reports/tr9), which LibreOffice is complaint with.

> But if it is not and LibreOffice does not support the requirement:
> this is a required feature. we expect LibreOffice to do this automatically.
> why? because the end user doesn't care about technical reasons that are
> related to. A user wants to work without thinking about that. 
> I'm a technical guy and understand this. but from a non-technical user, it
> is not acceptable.

That is unfortunate, but I don’t see how LibreOffice would know at which side the user wants the punctuation when both options are possibly valid and the one in use now is the most common, replace the + with an exclamation mark or closing quote, would you want it to be on the right of the C?
Comment 5 ajlittoz 2018-11-06 14:11:26 UTC
(In reply to Khaled Hosny from comment #4)
> (In reply to Ali Baghernejad from comment #3)
 
> It isn’t a bug. You get the same behavior in any other application. The
> direction of the characters in not determined by the language, but by fixed
> Unicode character properties and the Unicode Bi-direction Text Algorithm
> (http://unicode.org/reports/tr9), which LibreOffice is complaint with.

I take the point. But there is something on which LO Writer might improve.

I played a bit with the sample file containing the directionality markers. There is no visual clue they are present. Worse, even when positioning the cursor where they're supposed to be, you can't delete them with Backspace or Delete.

To get rid of them (to edit/tune the sequence), you must select more then need, erase and retype the extra selection.

Definitely, some indication when "display formatting marks" is enabled would be more user-friendly. Something in the way NO-BREAK SPACE or soft hyphens are shown against a grey background, the same as for field content or index anchor.

Should this be described in a separate feature request? Or could this NOTABUG be transformed into such a feature request?
Comment 6 ⁨خالد حسني⁩ 2018-11-06 14:37:31 UTC
(In reply to ajlittoz from comment #5)
> (In reply to Khaled Hosny from comment #4)
> > (In reply to Ali Baghernejad from comment #3)
>  
> > It isn’t a bug. You get the same behavior in any other application. The
> > direction of the characters in not determined by the language, but by fixed
> > Unicode character properties and the Unicode Bi-direction Text Algorithm
> > (http://unicode.org/reports/tr9), which LibreOffice is complaint with.
> 
> I take the point. But there is something on which LO Writer might improve.
> 
> I played a bit with the sample file containing the directionality markers.
> There is no visual clue they are present. Worse, even when positioning the
> cursor where they're supposed to be, you can't delete them with Backspace or
> Delete.
> 
> To get rid of them (to edit/tune the sequence), you must select more then
> need, erase and retype the extra selection.
> 
> Definitely, some indication when "display formatting marks" is enabled would
> be more user-friendly. Something in the way NO-BREAK SPACE or soft hyphens
> are shown against a grey background, the same as for field content or index
> anchor.
> 
> Should this be described in a separate feature request? Or could this
> NOTABUG be transformed into such a feature request?

That is a different issue and I agree the current behavior is sub-optimal, we support making several other invisible characters visible, and we should extend that list. Please open a new issue for this (I don’t know how to make the invisible characters visible, though, but hopefully someone will know and fix it).
Comment 7 ⁨خالد حسني⁩ 2018-11-06 14:41:39 UTC
An even simpler (and bit easier) solution is to insert left to right mark after the C++ (Insert → Formatting Mark → Left-to-right mark). It works as an invisible left to right character and fixes the issue here since there is a left to right character (the C) on the other side already.
Comment 8 V Stuart Foote 2018-11-06 15:37:30 UTC
IMHO believe => WFM is the more appropriate resolution. 

Input is correctly handled by applying the U+200E  LEFT-TO-RIGHT MARK [LRM] with Insert -> formating mark, or Alt+X toggle provides expected typographical behavior.

The same Left-to-right mark/Right-to-left mark UNO commands are available to assign to menu, toolbar button, context menu, or Keyboard shortcuts.

Though establishing a default keyboard shortcut _might_ be appealing for some locales.

Otherwise enhancements of see also bug 58434 for showing formatting marks when displaying non-printing characters remains to be implemented, still mostly as in table of attachment 112798 [details], would improve the UX by exposing use of the Unicode control.
Comment 9 Heiko Tietze 2018-11-07 07:57:44 UTC
Would be nice if the auto correction can enter the unicode characters. So typing C++ could automatically add the needed formatting.
Comment 10 V Stuart Foote 2018-11-07 12:59:35 UTC
(In reply to Heiko Tietze from comment #9)
> Would be nice if the auto correction can enter the unicode characters. So
> typing C++ could automatically add the needed formatting.

Actually auto correct can be used to do this now, as done with localized emoji, project could provide some subset of international abbreviations correctly flagged with U+200e and/or U+200f