Bug 152125 - Ease use of Unicode control characters for bidirectionality, e.g. RLI and PDI
Summary: Ease use of Unicode control characters for bidirectionality, e.g. RLI and PDI
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.5.0.0 alpha0+
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: RTL-UI
  Show dependency treegraph
 
Reported: 2022-11-19 14:41 UTC by Regina Henschel
Modified: 2023-01-19 08:27 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Examples with RTL in LTR (36.29 KB, application/vnd.oasis.opendocument.text)
2022-11-19 14:41 UTC, Regina Henschel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Regina Henschel 2022-11-19 14:41:36 UTC
Created attachment 183681 [details]
Examples with RTL in LTR

If you have a text run with strong RTL characters in a paragraph with base direction left-to-right, the implementation of the BiDi-Algorithm in LibreOffice works well.
But there are situations where bidirectional text is not rendered as necessary if only using the BiDi-Algorithm. Since ODF has no attribute to set the base direction on a portion of text, but only on the paragraph, such situations need to be solved by inserting Unicode control characters. Such are:
(1) LRI, RLI, FSI, PDI (U+2066..U+2069)
(2) LRE, RLE, LRO, RLO, PDF (U+202A..U202E)
(3) LRM, RLM (U+200E, U+200F)

Problem A
You cannot see, which one is used and where it is inserted because all this characters are hidden and have zero width. That is a problem when you need to edit the text. Using control characters (2) is problematic, but if you do not see them it is hard to remove such, for example.

Problem B
LibreOffice supports to enter the control characters (3). That is in Insert > Formatting Mark > Left-to-Right Mark and Right-to-Left Mark. But there exists no tool for the control characters (1) and (2). Entering them directly into the text via U+NNNN and Alt+x is very problematic, because character order might change while entering them.

A solution for A could be, to show the illustration glyphs (that with the dotted border in the Unicode chart files) when "Show Formatting Marks" is ON.

I have no idea for a good tool to enter these controls.

You should use the attachment and try to bring the sentence which has no control character into the correct word order. Then you will notice the problems for the user.
Comment 1 Heiko Tietze 2022-11-21 12:37:51 UTC
Eyal, Khaled, what do you think?
Comment 2 ⁨خالد حسني⁩ 2022-11-21 14:15:41 UTC
HTML has <bdi> and <bdo> elements, which work the same as the control characters, in case ODF wanted to grew a similar functionality.

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/bdi
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/bdo

For showing the control characters, I this showing the illustration glyphs might not be easy as we can’t get them from the fonts (most fonts will have blank glyphs for them, if they have glyphs at all). But if there is a way to draw something like this on the screen, that would be good. may be give it the same color as other visible formatting characters.

For insertion, we can add them to the menu next to the others.
Comment 3 ⁨خالد حسني⁩ 2022-11-21 14:17:17 UTC
If we make these visible, we should probably do the same for other bidi control chars, I don’t think they currently become visible with “Show Formatting Marks”.
Comment 4 Heiko Tietze 2022-11-22 08:41:55 UTC
I wonder what you actually want to do. Guess it's an interaction like a context menu command to insert special characters before and after a selection. Right?

As for the indication there is no space for highlighting. Could imagine something like for bookmarks though.
Comment 5 Regina Henschel 2022-11-22 09:41:54 UTC
(In reply to Heiko Tietze from comment #4)
> I wonder what you actually want to do.

The attached file has at the end an example with the wrong word order. Try to change it so, that the order is same as shown in the image at the top. If you do it yourself, you will notice the problems.

Try in the examples to figure out which method is actually used. Imagine you need to edit and correct a document, which is not your own one.
Comment 6 Heiko Tietze 2022-11-22 10:07:06 UTC
My question was how the RTL people would like to insert the control characters. Sounds pretty simple to have a command "Make selection run from left-to-right" and another one for the opposite (not sure we need this) and insert the unicode characters before/after.

Second part is the feedback. Once you have inserted those zero-width characters you probably want to know that. My take here was to use a kind of vertical I-beam symbol (using the pilcrow-blue makes sense here).
Comment 7 ⁨خالد حسني⁩ 2022-11-22 13:47:44 UTC
I’m thinking. may be make the behavior of the existing LTR/RTL toolbar buttons selection-dependent i.e. if there is no selection they change the direction of the paragraph where the cursor is in, if there is selection they insert RLI/LRI and PDF. This a change in behavior and might be a bit surprising to some, but we have also seen reports before where people were expecting this kind of behavior.

Either way, this should be orthogonal of providing, say, menu entries for inserting individual bidi control chars as this provides maximum flexibility (though it might be overwhelming since there are so many of them, so if we implement the common case above this can be buried somewhere and not exposed by default).
Comment 8 Eyal Rozenberg 2022-11-22 20:58:24 UTC Comment hidden (obsolete)
Comment 9 Eyal Rozenberg 2022-11-22 20:58:44 UTC
(In reply to Heiko Tietze from comment #1)
> Eyal, Khaled, what do you think?

A few notes:

1. Most Hebrew speaking users, in my experience, are not aware of Unicode control chars, and never use them. That is not to say we shouldn't decently support their use though... Personally, I occasionally use RLM's and LRM's, and use the keyboard to insert them. I figure out they are present by noticing the cursor doesn't move to the next position when I press an arrow key. And I never use any of the others control chars, because if I did, I would have to maintain stacks of RTL state in my head, and that is simply not something humans want to do, AFAICT. And I'm an "advanced" user. So I believe almost nobody will use the other control chars of their own volition. The problem is when a document you've opened, or text you've copy-pasted, has these control chars - and then LO would start acting funny.
2. This issue is relevant not only for Writer. It can definitely come up in Calc or Impress, even if less often.
3.  Any visual indication should preferably be such, that a "newbie" user who is not familiar with these chars, and has gotten them by accidental insertion, copy-pasting, or opening an existing document, can be quickly directed to an explanation of what those control chars do. Hopefully something with illustrations.
4. If Unicode control chars are made visible , we must have an easy-to-toggle option to hide them, since their visibility will probably encumber reading.

(In reply to Regina from comment #0)
> "the illustration glyphs (that with the dotted border in the Unicode chart files)" 

Not sure what that is. Do you mean what we would see in the Special Character dialog? That's often an empty white area for LRM and RLM.

(...)
Comment 10 Eyal Rozenberg 2022-11-22 21:13:21 UTC
(In reply to Heiko Tietze from comment #6)
> My question was how the RTL people would like to insert the control
> characters. 

Personally, like I said - keyboard shortcut. That let's you use these marks as part of the flow of typing, or when modifying existing text - while looking at that text and navigating with the keyboard, which is easier than navigating to different positions with the mouse: The former navigates logically, the latter visually.

But I may not be a "representative sample"... it's actually not easy to answer your question, especially considering how lay users simply don't know about these.

> Sounds pretty simple to have a command "Make selection run from
> left-to-right" and another one for the opposite (not sure we need this) and
> insert the unicode characters before/after.

That is not simple IMHO, and not a good idea for two reasons:

1. It involves state for a selection rather for a character. Without control chars, you just have characters with strong, weak or no directionality - and an RLM/LRM is the same thing, except that its width is zero. That is a simpler mental model than selections with direction state.
2. People who know the control characters, or who know what RTL and LTR is, would not be able to guess exactly what this command does just by reading it - and personally I would worry it does something complex and "scary", in the sense that it would clash with I can do manually. It's true that a help document alleviates that somewhat, but still.

I was faced with a similar challenge for my RTL support extension for Thunderbird, and opted for the context section to have "Insert Control Character" submenu, with the items being: "Right-to-Left Mark" and "Left-to-Right Mark" - no option to insert something else. I might have added the acronym in parentheses, not sure why I decided against it.

If we want to make the other control chars easily-insertable - and I'm not at all sure that we do - then I would keep those two entries at the top of the submenu, add a separator, then offer the other marks.

> Second part is the feedback. Once you have inserted those zero-width
> characters you probably want to know that. My take here was to use a 
> kind of vertical I-beam symbol (using the pilcrow-blue makes sense here).

So, it's important this doesn't clash with other I-beam-like indications (like start or end of bookmarks?); and what happens when you also have a character border on the side, especially if it's a similar color.

I'll make another suggestion - not rejecting yours, just a thought: Thinking about the visible/invisible toggle, I might let myself be more expansive and let the indicator take up space: An icon indicating the direction, e.g. a triangle or arrow, perhaps similar to what we have on the toolbar - but not with the pilcrow symbol itself, since it's not about the paragraph. The upside is that the meaning is better indicated, the downside is that it may significantly affect the text layout, causing words to overflow the line and move to another line, with that in itself having potential undesirable effect directionality-wise.
Comment 11 Eyal Rozenberg 2022-11-22 21:31:05 UTC
(In reply to خالد حسني from comment #7)
> I’m thinking. may be make the behavior of the existing LTR/RTL toolbar
> buttons selection-dependent i.e. if there is no selection they change the
> direction of the paragraph where the cursor is in, if there is selection
> they insert RLI/LRI and PDF. This a change in behavior and might be a bit
> surprising to some, but we have also seen reports before where people were
> expecting this kind of behavior.

I would be totally against this, because:

1. The toolbar buttons are explicitly marked as regarding the paragraph direction
2. You have to select multiple paragraphs in order to set their paragraph direction...
3. Would be incompatible with every editor app or web control which has paragraph-direction-setting buttons. (Including LO <= 7.5, MS Office, GDocs etc.)

> Problem B
> LibreOffice supports to enter the control characters (3). That is in Insert
> > Formatting Mark > Left-to-Right Mark and Right-to-Left Mark. But there
> exists no tool for the control characters (1) and (2).

They can be inserted using Insert > Special Character...  ; now, that's quite inconvenient, but I believe nobody, more-or-less, ever wants to insert anything other than RLM or LRM. I might be wrong, but - do we know that there is user interest in a more convenient method of inserting these?
Comment 12 Heiko Tietze 2023-01-19 08:27:11 UTC
The topic was on the agenda of the design meeting. Since no RTL expert joined the call we recommend to let the experts decide. 

To me it sounds very much straight-forward to add a control character before and after a selection on click or shortcut, meaning via UNO command.