Bug 61795 - Weak Characters (like brackets) are mispositioned with mixed RTL and LTR
Summary: Weak Characters (like brackets) are mispositioned with mixed RTL and LTR
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 68092 (view as bug list)
Depends on:
Blocks: RTL-CTL
  Show dependency treegraph
 
Reported: 2013-03-04 15:23 UTC by Shlomi Israely
Modified: 2017-10-17 14:14 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:


Attachments
test case (12.69 KB, application/vnd.oasis.opendocument.text)
2013-03-04 15:23 UTC, Shlomi Israely
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Shlomi Israely 2013-03-04 15:23:12 UTC
Created attachment 75900 [details]
test case

When mixing text of an RTL language (Hebrew) and LTR language (English) many times brackets are misplaced.

This causes inability to mix texts from English and other languages, thus making Writer unusable for day to day use.
reproduce-able text:
Hello (עול)ם
expected text:
Hello ‏(עול)ם

as you can see, the parenthesis is misplaced in the first example.  

Todays LibreOffice's solution is to add LRM and RLM chars in the correct place, this is very unintuitive for 90% of the users.

I think the BiDi algorithm should be enhanced, and place LRM/RLM automatically according to the current window's keyboard layout. That is if the paragraph is RTL but the layout is English then put an LRM char before the weak char. This is what the user usually expects to happen.

I've attached a test case .odt file.

A similar bug with brackets is:
https://bugs.freedesktop.org/show_bug.cgi?id=56408
It might have a similar cause , but it's a different bug.
Comment 1 Amir E. Aharoni 2013-03-07 11:36:10 UTC
The really good way to resolve this is not to change the Unicode bidi algorithm, but to add support for inline direction marking to the OpenDocument standard and then implement it in LibreOffice. Put simply, HTML has <div dir="rtl"> and <span dir="rtl">, and OpenDocument only has something like <div dir="rtl">, but not <span dir="rtl">.

Using directionality marks like RLM, RLE and PDF is not a robust way to resolve this, although if they are used internally and the user doesn't have to use them directly, it's probably OK.

I tried to bring this issue up several times on the OpenDocument mailing list, but didn't get any useful replies. See here:
https://lists.oasis-open.org/archives/office-comment/201110/msg00000.html

There may be some challenges with implementing this, but before discussing the implementation challenges, it must be agreed to make the change in the standard that LibreOffice is implementing.
Comment 2 Maxim Monastirsky 2013-08-14 14:28:27 UTC
*** Bug 68092 has been marked as a duplicate of this bug. ***
Comment 3 QA Administrators 2015-04-19 03:23:11 UTC Comment hidden (obsolete)
Comment 4 Hanan Sela 2015-04-21 18:55:01 UTC
The bug is still a problem in LO 4.4.2.2 on Ubuntu 14.04. The bug now is not affecting the brackets in the middle of the line but at the end of the line the brackets will not stay with the LTR word when the rest of the line is RTL. If you insert "no width no break" formatting mark, the brackets stick to the first letter but then the rest of the LTR word break at the end of the line.
Comment 5 QA Administrators 2016-09-20 10:29:34 UTC Comment hidden (obsolete)
Comment 6 Ofir 2016-09-20 11:00:18 UTC
Still reproducible with:

Version: 5.2.1.2
Build ID: 1:5.2.1~rc2-0ubuntu1~xenial0
CPU Threads: 1; OS Version: Linux 4.4; UI Render: default; 
Locale: en-US (en_US.UTF-8); Calc: group
Comment 7 Hanan Sela 2016-09-21 03:51:05 UTC
The bug is still reproducible in LO Version: 5.2.1.2
Build ID: 31dd62db80d4e60af04904455ec9c9219178d620
CPU Threads: 4; OS Version: Linux 4.2; UI Render: default; 
Locale: en-US (en_US.UTF-8); Calc: group

OS Ubuntu 15.10
Comment 8 Khaled Hosny (inactive) 2016-12-26 07:06:43 UTC
This is how the Unicode bidirectional text algorithm works, and the latest versions of LibreOffice support the latest version of the algorithm which handles bracket pairing. Using formatting characters is a perfectly fine way to solve ambiguities that the algorithm can’t handle by automatically.
Comment 9 amirimobile 2016-12-30 12:55:25 UTC
I seriously don't see how this is resolved. There is a serious problem here: Users get the _wrong_ results out of what they type. That means that this is in fact a bug.

Regarding "Using formatting characters is a perfectly fine way..." - if what you mean by this is that users enter these formatting characters manually, then it is definitely _not_ fine. This is a terrible nuisance to the few who even know what these formatting characters are and how to input them, but the vast majority of users are simply dumbfounded by the current behavior, and this is something that does actually work in the popular (sorry) office suite.

All of this has of course already been mentioned by the OP of this bug, and what has already been said (in subsequent comments) is that the Unicode BiDi algorithm does not have to change, but libreoffice _does_.
Comment 10 Khaled Hosny (inactive) 2016-12-30 13:03:48 UTC
(In reply to amirimobile from comment #9)
> I seriously don't see how this is resolved. There is a serious problem here:
> Users get the _wrong_ results out of what they type. That means that this is
> in fact a bug.

No it is not, that is how the Unicode Bidirectional Text Algorithm works, not ideal but that is why control characters exists; to asset the algorithm when there is ambiguity (which is the case here).

> Regarding "Using formatting characters is a perfectly fine way..." - if what
> you mean by this is that users enter these formatting characters manually,
> then it is definitely _not_ fine. This is a terrible nuisance to the few who
> even know what these formatting characters are and how to input them, but
> the vast majority of users are simply dumbfounded by the current behavior,
> and this is something that does actually work in the popular (sorry) office
> suite.

Please cite examples of this working differently elsewhere.

> All of this has of course already been mentioned by the OP of this bug, and
> what has already been said (in subsequent comments) is that the Unicode BiDi
> algorithm does not have to change, but libreoffice _does_.

So what are the requested changes, other than changing the algorithm?
Comment 11 amirimobile 2016-12-30 19:21:29 UTC
(In reply to Khaled Hosny from comment #10)
> (In reply to amirimobile from comment #9)
> > I seriously don't see how this is resolved. There is a serious problem here:
> > Users get the _wrong_ results out of what they type. That means that this is
> > in fact a bug.
> 
> No it is not, that is how the Unicode Bidirectional Text Algorithm works,
> not ideal but that is why control characters exists; to asset the algorithm
> when there is ambiguity (which is the case here).

That's just it: The algorithm is not at all the point here. LibreOffice Writer _is_. The algorithm specifies how to display bidirectional text. Writer is the instrument to be used to write text. What we're saying here is that writing is broken, not how the already-written text is displayed.

> 
> > Regarding "Using formatting characters is a perfectly fine way..." - if what
> > you mean by this is that users enter these formatting characters manually,
> > then it is definitely _not_ fine. This is a terrible nuisance to the few who
> > even know what these formatting characters are and how to input them, but
> > the vast majority of users are simply dumbfounded by the current behavior,
> > and this is something that does actually work in the popular (sorry) office
> > suite.
> 
> Please cite examples of this working differently elsewhere.

Microsoft Word:
- Base direction: LTR
- Write text using strongly typed LTR characters
- switch to RTL language
- write text using strongly type RTL characters, mixed with various bracket types
-> brackets keep their intended positions

> 
> > All of this has of course already been mentioned by the OP of this bug, and
> > what has already been said (in subsequent comments) is that the Unicode BiDi
> > algorithm does not have to change, but libreoffice _does_.
> 
> So what are the requested changes, other than changing the algorithm?


Two possible solutions that I would consider have already been suggested by Amir E. Aharoni 2013-03-07 11:36:10 UTC:

--- start quote ---
The really good way to resolve this is not to change the Unicode bidi algorithm, but to add support for inline direction marking to the OpenDocument standard and then implement it in LibreOffice. Put simply, HTML has <div dir="rtl"> and <span dir="rtl">, and OpenDocument only has something like <div dir="rtl">, but not <span dir="rtl">.

Using directionality marks like RLM, RLE and PDF is not a robust way to resolve this, although if they are used internally and the user doesn't have to use them directly, it's probably OK.
--- end quote ---
Comment 12 Khaled Hosny (inactive) 2016-12-30 19:36:32 UTC
(In reply to amirimobile from comment #11) 
> Microsoft Word:
> - Base direction: LTR
> - Write text using strongly typed LTR characters
> - switch to RTL language
> - write text using strongly type RTL characters, mixed with various bracket
> types
> -> brackets keep their intended positions

Please attach a sample document.
Comment 13 Khaled Hosny (inactive) 2016-12-30 21:09:27 UTC
Screenshot of the MS Office rendering would be appreciated as well.
Comment 14 QA Administrators 2017-07-27 12:06:31 UTC Comment hidden (obsolete)
Comment 15 QA Administrators 2017-08-30 19:31:50 UTC Comment hidden (obsolete)
Comment 16 Lior Kaplan 2017-10-11 08:21:15 UTC
Still happens in 5.4.1.
Comment 17 Khaled Hosny (inactive) 2017-10-11 10:45:58 UTC
This is not really a bug but the expected Unicode bidirectional text rendering. The claim that MS Office handles this differently have not been supported with evidence (it wouldn’t matter even, MS Office is known not be in full compliance with Unicode Bidirectional Text Algorithm).
Comment 18 Mike Kaganski 2017-10-16 06:40:14 UTC
(In reply to Khaled Hosny from comment #17)

Khaled:

I agree with OP, that there *is* a problem here. Again: that's not the problem of rendering! There's nothing to Unicode bidirectional text algorithm. The user is able to manually create good-looking text if user performs some special actions; i.e., citing comment #0,

> Todays LibreOffice's solution is to add LRM and RLM chars in the correct place

Of course, text *rendering* should not place some formatting characters anywhere they aren't present in source. But the problem is not rendering, as already said multiple times; the problem is *input*, which should analyse the situation (current IME mode?) and insert those characters at input stage, when the string is created from keyboard. So, I suggest you to revert your decision to dismiss this, as UX-wise, this is a horrible bug actually.
Comment 19 Khaled Hosny (inactive) 2017-10-17 14:14:58 UTC
(In reply to Mike Kaganski from comment #18)
> (In reply to Khaled Hosny from comment #17)
> 
> Khaled:
> 
> I agree with OP, that there *is* a problem here. Again: that's not the
> problem of rendering! There's nothing to Unicode bidirectional text
> algorithm. The user is able to manually create good-looking text if user
> performs some special actions; i.e., citing comment #0,
> 
> > Todays LibreOffice's solution is to add LRM and RLM chars in the correct place
> 
> Of course, text *rendering* should not place some formatting characters
> anywhere they aren't present in source. But the problem is not rendering, as
> already said multiple times; the problem is *input*, which should analyse
> the situation (current IME mode?) and insert those characters at input
> stage, when the string is created from keyboard. So, I suggest you to revert
> your decision to dismiss this, as UX-wise, this is a horrible bug actually.

There is no concrete proposal what should be done here. I don’t personally think we should start inserting characters the user didn’t type, not invisible ones at least. I don’t know any program that does that (regarding weak bidi characters) apart from what said here about MS Word (and I don’t think this is publicly specified anywhere, so we would be effectively reverse-engineering it). Also I feel that whatever we do here is likely to have undesired side-effects, if there were a robust way to handle this it would have made it to UBA by now.

But anyway, that is my 2 qirsh, feel free to re-open the issue if you think otherwise.