156819 – FILEOPEN DOCX Sub-direction is lost loading the file

Bug 156819 - FILEOPEN DOCX Sub-direction is lost loading the file

Summary: FILEOPEN DOCX Sub-direction is lost loading the file

Status:	UNCONFIRMED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Writer (show other bugs)
Version: (earliest affected)	Inherited From OOo
Hardware:	All All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	DOCX-RTL
	Show dependency tree / graph

Reported:	2023-08-19 14:08 UTC by Hossein
Modified:	2024-08-25 11:27 UTC (History)
CC List:	1 user (show)

See Also:	156582
Crash report or crash signature:

Attachments
DOCX: Numbers 123 / ۱۲۳, and the text "C++" in Persian (Farsi) paragraph (12.06 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document) 2023-08-19 14:08 UTC, Hossein	Details
PDF output from LibreOffice (63.55 KB, application/pdf) 2023-08-19 14:13 UTC, Hossein	Details
PDF output from MS Word (63.55 KB, application/pdf) 2023-08-19 14:14 UTC, Hossein	Details
PNG: side by side comparison of the output from LibreOffice and MS Word (36.05 KB, image/png) 2023-08-19 14:15 UTC, Hossein	Details
TXT: The output of saving the DOCX to a text file (223 bytes, text/plain) 2023-08-19 14:34 UTC, Hossein	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Hossein 2023-08-19 14:08:02 UTC

Created attachment 189039 [details]
DOCX: Numbers 123 / ۱۲۳, and the text "C++" in Persian (Farsi) paragraph

Description:
While loading a DOCX file, sub-direction is lost in an RTL paragraph. An LTR part of an RTL paragraph is also displayed as RTL. This also affects the display of numerals.

Steps to Reproduce:
1. Open LibreOffice and set the number display to "Context"
2. Open the attachment in LibreOffice.
3. Open MS Word and set the numerals to context in "File > Options > Advanced > Numerals: Context".
4. Open the attachment in MS Word.
5. Compare the display and output.

Actual Results:
The display of these in LibreOffice are completely different from MS Office. In LibreOffice, sub-direction is lost loading the file. C++ is rendered as ++C, and the 123 is displayed as ۱۲۳, which is incorrect. 

Expected Results:
The original document contains two numbers, one ۱۲۳ and one 123. Also, it contains the text C++.

Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 20f57e14362674d321ef184e1987f41a6418adc2
CPU threads: 20; OS: Windows 10.0 Build 22621; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_DE); UI: en-US
Calc: CL threaded

Comment 1 Hossein 2023-08-19 14:13:41 UTC

Created attachment 189040 [details]
PDF output from LibreOffice

Comment 2 Hossein 2023-08-19 14:14:00 UTC

Created attachment 189041 [details]
PDF output from MS Word

Comment 3 Hossein 2023-08-19 14:15:42 UTC

Created attachment 189042 [details]
PNG: side by side comparison of the output from LibreOffice and MS Word

Comment 4 Khaled Hosny 2023-08-19 14:21:34 UTC

This can be hacked by surrounding the text with RLE/LRE and PDF Unicode control characters, but when exporting back we wouldn’t have a way to differentiate between this and user-entered control characters (though probably it does not matter in practice). Otherwise we would need an internal machinery to handle explicit direction of text portion which we currently don’t have AFAIK.

This might be a duplicate of bug 156582 and possibly other DOCX-RTL issues that have the same root cause.

Comment 5 Hossein 2023-08-19 14:34:03 UTC

Created attachment 189043 [details]
TXT: The output of saving the DOCX to a text file

"Add bi-directional marks" should be selected during the export. In "File Conversion" dialog, the selected encoding is "Unicode (UTF-8)" in "Other encoding".