Bug Hunting Session
Bug 49320 - FILEOPEN: DOCX - Numbers that arent proceeded by english text are treated as CTL/RTL text in a RTL document
Summary: FILEOPEN: DOCX - Numbers that arent proceeded by english text are treated as ...
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: RTL-CTL DOCX
  Show dependency treegraph
 
Reported: 2012-04-30 19:49 UTC by Lionel Elie Mamane
Modified: 2018-03-24 20:52 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
problematic document (11.94 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-04-30 19:49 UTC, Lionel Elie Mamane
Details
MS Word print as PDF (69.92 KB, application/pdf)
2012-04-30 19:53 UTC, Lionel Elie Mamane
Details
PDF export showing the remaining bug in version 5.4.2 (20.95 KB, application/pdf)
2017-11-01 23:26 UTC, Omer Zak
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lionel Elie Mamane 2012-04-30 19:49:30 UTC
Created attachment 60818 [details]
problematic document

The attached document was created in a hebrew version of MS Word, and anonymized in a French version of MS Word 2003. It does not contain any single non-latin/european character (and never did), but its import is all wrong:
 - Parentheses are not around the right text, are in the wrong place and direction (opening instead of closing and vice-versa).
 - Text is justified to the right instead of to the left.

It looks like a RTL/LTR (right-to-left/left-to-right) writing direction confusion.

Compare to the result given by MS Word itself.
Comment 1 Lionel Elie Mamane 2012-04-30 19:53:33 UTC
Created attachment 60819 [details]
MS Word print as PDF
Comment 2 Rainer Bielefeld Retired 2012-04-30 23:29:56 UTC
[Reproducible] with "LibreOffice 3.5.3.2 (RC2) German UI/Locale [Build-ID: 235ab8a-3802056-4a8fed3-2d66ea8-e241b80] on German WIN7 Home Premium (64bit) 

Neither MS WORD Viewer nor AbiWord 2.9 show that problem.

That never worked with LibO, problem already visible with LibO 3.3.3, and was already in OOo 3.3, so inherited from OOo

@Cédric:
Please set Status to ASSIGNED and add yourself to "Assigned To" if you accept this Bug
Comment 3 Maxim Monastirsky 2013-11-24 14:11:55 UTC
The docx import bug was already fixed in Bug 43093. The attached document opens correctly (RTL left aligned) in recent master.

But still the document doesn't look the same as in MS Word. Further investigation showed that there is a difference in how LO shows RTL left aligned text, and it's not related to docx. Steps to reproduce:
1. Open a new document in Word
2. Set paragraph direction to RTL, and alignment to left.
3. Write '1 Test'
4. Repeat the same in LO
5. Compare results

The solution is to insert LTR mark (Insert->Formatting Mark->Left-to-right mark between '1' and 'Test'), but it can't be applied automatically to imported files.
Bug 61795 and Bug 69109 are related.
Comment 4 Cédric Bosdonnat 2014-01-20 08:57:25 UTC
Restricted my LibreOffice hacking area
Comment 5 Joel Madero 2015-05-02 15:42:06 UTC Comment hidden (obsolete)
Comment 6 Buovjaga 2015-06-20 14:55:00 UTC
Confirmed there is still a problem: note in the PDF, 1st page there is "4.5. AA.. 3:10".
In LibO, the same line is "AA.. 3:10 .4.5"

Win 7 Pro 64-bit Version: 5.1.0.0.alpha1+
Build ID: 3ecef8cedb215e49237a11607197edc91639bfcd
TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2015-06-19_23:16:58
Locale: fi-FI (fi_FI)
Comment 7 QA Administrators 2016-09-20 10:11:27 UTC Comment hidden (obsolete)
Comment 8 Lior Kaplan 2017-10-12 11:17:01 UTC
Still happens 5.4.1.
Comment 9 Yousuf Philips (jay) (retired) 2017-10-13 14:46:59 UTC
When RTL is enabled, LO treats numbers, periods, and colons as CTL encoded text, resulting in them not having the correct position relative to english text that follows it, when they dont have english text proceeding it.

Typed          Displayed
===========================
'1 Test'    -> 'Test 1'
'A 1 Test'  -> 'A 1 Test'
'. 1 Test'  -> 'Test 1 .'
'A. 1 Test' -> 'A. 1 Test'

This is similar to the issue being addressed in bug 69109.
Comment 10 Omer Zak 2017-11-01 23:24:12 UTC
Tested in:

Version: 5.4.2.2.0+
Build ID: 1:5.4.2-3~bpo9+1
CPU threads: 8; OS: Linux 4.9; UI render: default; VCL: gtk2; 
Locale: en-US (en_US.utf8); Calc: group

OS: Debian 64bit Stretch (Debian 9.2, with some backported packages)

1. Parentheses seem to be around the right text, in the right place and direction.
2. Text is now left-justified.

However, the numbers are to the right of the paragraph text instead of to the left.
Comment 11 Omer Zak 2017-11-01 23:26:27 UTC
Created attachment 137432 [details]
PDF export showing the remaining bug in version 5.4.2

This PDF file demonstrates both bug fixes and remaining bug in version 5.4.2.
Comment 12 Khaled Hosny 2018-03-24 20:52:16 UTC
The remaining “issue” is how bidirectional text algorithm works, setting the paragraph right to left means “this is primarily a RTL text” and this has implications on how characters with weak directionality (e.g. numbers) or neutral (directionality) punctuation are handled. I’m not sure what is the real use case here, but it is really bogus to set LTR text RTL and left align it and then expect it to behave as if was set LTR.