Bug Hunting Session
Bug 69109 - Editing: RTL character string causes (left to right) numerics to flip order to also be RTL
Summary: Editing: RTL character string causes (left to right) numerics to flip order t...
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: RTL-CTL
  Show dependency treegraph
 
Reported: 2013-09-08 16:39 UTC by braunmax
Modified: 2018-04-29 12:37 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Odt file which demonstrates the problem described. (1.50 MB, application/vnd.oasis.opendocument.text)
2013-09-08 16:39 UTC, braunmax
Details

Note You need to log in before you can comment on or make changes to this bug.
Description braunmax 2013-09-08 16:39:44 UTC
Created attachment 85442 [details]
Odt file which demonstrates the problem described.

יוד-הה-וו-ההI  1 2 3 4 5 6 Roman text                 
      
In this file (copied and pasted from writer and attached also), attempting to delete the roman “I” next to the Hebrew text causes the numerics to change style, and flip to the left of the Hebrew in right-to-left-order, in the same way as the Hebrew convention. The font name of the numerics changes to “Mangal” from Times-New-Roman. The roman text itself is preserved, If there is no text, but spaces only, the spaces change to the new format until the end of the line, or until the next alpha-character.

To note is that the language of the Hebrew and adjacent spaces, (excluding the “I”) is “Hindi”, and refuses to change.

I have tried multiple tricks to try to enter numerical characters after Hebrew-but this flipping and reformatting happens. Here I typed the numerics after the Roman alpha “I”, which I cannot now delete without the effects above.

A similar style change occurs with Hebrew characters in Table of contents, with an altered page number after the series of dotted fill characters.

I use the windows version of LibreOffice writer: 4.1.1.2
Comment 1 Maxim Monastirsky 2013-09-08 18:36:38 UTC
(In reply to comment #0)
> In this file (copied and pasted from writer and attached also), attempting
> to delete the roman “I” next to the Hebrew text causes the numerics to
> change style, and flip to the left of the Hebrew in right-to-left-order, in
> the same way as the Hebrew convention. The font name of the numerics changes
> to “Mangal” from Times-New-Roman. The roman text itself is preserved, If
> there is no text, but spaces only, the spaces change to the new format until
> the end of the line, or until the next alpha-character.

Hi,
I can reproduce this behavior with LO 4.1.1.2 under Fedora 19 (64-bit). The "Mangal" font is the default CTL font in your document, therefore LO uses it when it thinks your text is a CTL one. You can change it through Tools->Options->LibreOffice Writer->Basic Fonts (CTL)

> To note is that the language of the Hebrew and adjacent spaces, (excluding
> the “I”) is “Hindi”, and refuses to change.

Strange, as I'm able to change the language through Format->Character...->CTL Font. Make sure that BiDi is activated at Tools->Options...->Language Settings->Languages->Show UI elements for Bi-Directional writing. There you can also set the default CTL language for your documents, so you don't need to change it manually every time.

> I have tried multiple tricks to try to enter numerical characters after
> Hebrew-but this flipping and reformatting happens. Here I typed the numerics
> after the Roman alpha “I”, which I cannot now delete without the effects
> above.

The trick is to place the cursor between the roman "I" and the Hebrew text, and insert a LTR mark using Insert->Formatting Mark->Left-to-right mark. So I'm not sure it's a bug.

I'm closing this bug for now as "NOTABUG". Feel free to change the status back to "UNCONFIMRED" if you disagree.
Comment 2 braunmax 2013-10-05 16:14:28 UTC
@Maxim:

Thank you for the help... your workaround works, and I've fixed the "Mangal" issue as suggested and as you explained.

I still think it is a bug, for to add roman alpha's does not cause the flip, only numerics...

Perhaps that is because in default settings the numeric characters are selected as Roman for the CLT text, and hence LO/writer does not recognise the fact that Roman is being used.

>The roman text itself is preserved, If
> there is no text, but spaces only, the spaces change to the new format until
> the end of the line, or until the next alpha-character

(After carrying out the settings changes to CLT - by first activating the "display" as you suggested - now italics kerning/spacing everywhere else in the document has gone wild. [LO 4.1.1.2] But that may be something else. If it persists - if I can find a way to totally revert to where I was, and then re-induce so I can actually describe the behaviour - I'll report a separate bug).

But for now, thanks again for the workaround!

Max
Comment 3 Maxim Monastirsky 2013-10-13 14:43:31 UTC
(In reply to comment #2)
> I still think it is a bug, for to add roman alpha's does not cause the flip,
> only numerics...
OK, so basically what you saying is that numerals should still be detected as non-CTL text, as it was before the removal of the 'I' letter. I think it's a reasonable expectation, since you didn't change anything regarding the numerals, just deleted some other text. So it shouldn't have any effect on the numerals.
Comment 4 Lior Kaplan 2013-12-14 13:01:04 UTC
Also happens with Arabic, not limited to Hebrew.
Comment 5 QA Administrators 2015-04-19 03:20:44 UTC Comment hidden (obsolete)
Comment 6 braunmax 2015-04-22 08:22:17 UTC
The bug still behaves in the same fashion (thus unchanged):

Operating system: Windows 7 Enterprise: v6.1 (Build 7601 Service pack #1)

LibreOffice writer: 
Version: 4.4.1.2
Build ID: 45e2de17089c24a1fa810c8f975a7171ba4cd432
Locale: en_GB

Thank you.
Comment 7 QA Administrators 2016-09-20 09:33:09 UTC Comment hidden (obsolete)
Comment 8 Yousuf Philips (jay) (retired) 2017-10-12 20:21:53 UTC
So i checked MS Word 2013 and if I change my keyboard language from arabic to english before typing the numbers, then it doesnt include the numbers as part of the arabic sequence. So I think the correct solution is for LO to automatically insert an RTL mark when the user switches his keyboard to a RTL language and insert an LTR mark when the user switches his keyboard back to a LTR language.

@Maxim: Is the changing of keyboard layouts something that LO can detect?
Comment 9 Omer Zak 2017-11-03 19:52:42 UTC
Still happens in:

Version: 5.4.2.2.0+
Build ID: 1:5.4.2-3~bpo9+1
CPU threads: 8; OS: Linux 4.9; UI render: default; VCL: gtk2; 
Locale: en-US (en_US.utf8); Calc: group

OS: Debian 64bit Stretch (Debian 9.2, with some backported packages)
Comment 10 Khaled Hosny 2018-01-25 13:15:35 UTC
Closing, not a bug as explained earlier.

Numbers have weak LTR direction, and the space has neutral direction. When the “I” (a strong LTR character) is present, the numbers becomes also string LTR and the space becomes LTR as well, but when the “I” is removed the number are preceded by Hebrew characters (strong RTL) and thus the numbers remain weak LTR and the space becomes RTL and the order changes. This is how bidirectional text works in Unicode.

MS Word bidirectional behavior is known to be non-standard and we shouldn’t follow it. Mangling user input and inserting LTR mark is really bad idea and is going to open a can of worms and non-standard behaviors. Standards are good because when everyone follows then the results are expected regardless of the application.
Comment 11 braunmax 2018-04-28 07:18:23 UTC
Dear Khaled Hosny,
Thank you for giving attention to this bug, which I highly appreciate.

I unfortunately need to add the following, which may allow the status to be changed back to an active bug:
(i) All editing is done with an LTR keyboard (English (US) keyboard, English (UK) as prime language). The only secondary keyboard is the English (UK) keyboard. I have set no secondary keyboard and CTL enabling changes little of this behaviour, in fact. Some value is obtained by Insert->RTL (or LTR) code.
(ii) Even worse than the simple form of the bug I have registered is the effect that if spaces between the numerals are removed, except for a space placed as thousand marker between 123 and 456 the behaviour on deleting the "I" is even more inconsistent, and the number changes to 456 123.... the meaning thus changes due to the flip. This cannot be anything but a bug.
(iii) The call to limit the correction/fix to "standard" of unicode is a divergence, with this argument notepad would be the only editor, ascii or ebcdic code our only character lists.
(iv) The conflation of what is expected of a WYSIWIG editor, and the properties of a character set is unhelpful. With a pointer device like a mouse, one expects an insertion point to be transparently placeable, and with the LTR keyboard one expects to be able to insert or delete a character as needed, without changing the visual rendition of other characters which are already present. WYSIWIG editors DO insert formatting codes and overrides transparently, and must do so. A hex level editor of course is something different, as one had in the "code view" of wordperfect. The only final limitation is open document text format type 1.2.
(v) One cannot rely on the keyboard choice only to affect behaviour either, what about Japanese, which functions both LTR horizontally, and Vertically downwards, left to right wrapping, often uses arabaic numerals etc.
(vi) The left and right pointer keys, and the backspace and forward delete keys should not switch around in a single keyboard [alternate keyboards specific to a language may be different]- the danger is that a single character in a complex string or paragraph may not then be reachable and deletable... I can also produce even that faulty behaviour.

Thus - not only does this bug (which affects all RTL/LTR/CTL algorithms) open a can of worms (as you have warned) - it has opened a veritable plague of parasites. And a solution may be found as we progress further in language and character flexibility... as perhaps LO 6.1 may induce.

For now this bug should remain as needing attention, and potentially be elevated to a higher priority. It is either an implementation flaw, or violates a design assumption, in both cases a bug.

I hope this reply does gain your support.    

Regards,

Max
Comment 12 Eyal Rozenberg 2018-04-28 09:20:34 UTC
Max, are you claiming that the behavior contradicts the Unicode bidirectional algorithm: https://unicode.org/reports/tr9/ ?

If you are, please explain more clearly (with visual vs logical orders) what's expected vs what's happening.

If you aren't, and you're suggesting divergence, you really need to rewrite (iii) and onwards in your last comment, since I really don't get it. And - that would be a _huge_ decision, which you should talk about with the RTL Telegram group at least, for a start:

https://t.me/joinchat/AhdT9BG-AEm4iTzUZgi78g
Comment 13 braunmax 2018-04-29 12:37:05 UTC
Dear Eyal Rozenberg and Khaled Hosny,

Having worked through the referenced algorithm as current, I agree fully that there is no bug, as such.

Thank you Eyal for your comprehensive comment I appreciate this truly, the cursor movement is what I need after the setting to "visual" - I cannot remember when I set my defaults to "logical" but know it was during one of the attempts to understand the behaviour of what appeared to be a bug to me.

Thank you once again Khaled Hosny for triggering my detailed retesting of the alleged bug and algorithm, I apologise for doubting that my description had been correctly understood, the error was mine, and your response and cloure decision fully justified.

I ironically fell into the self-stated trap of mixing editor behaviour (for which settings are available) with the interpretation of a misbehaving routine.

The full facility to create the string as I wish it to be is by using the (algorithm approved) insert->formatting marks->left to-right formatting marker/code at the first numeral (or before). Numerics do actually swap their order (at an ISO thousands delimiter of a space) as mentioned, which is also corrected with the insertion of the marker.

To allow the automatic table of contents to have correct format (i.e. have the page numbers and dots appear correctly with RTL - they were in a different format to the other RTL headings) I also insert the LTR mark at the end of the heading, solving my original problem in 2013.

Whether the mark should be entered when a deletion causes previous LTR text to change to or join the RTL string is of course an editor behaviour point to be debated outside a bug report, however good consistency has definitely been achieved.

Again, thank you both and the others who commented for assistance in this - and again I find LibreOffice superior to Micorsoft's offerings.

Kind regards,

Max