Created attachment 189039 [details] DOCX: Numbers 123 / ۱۲۳, and the text "C++" in Persian (Farsi) paragraph Description: While loading a DOCX file, sub-direction is lost in an RTL paragraph. An LTR part of an RTL paragraph is also displayed as RTL. This also affects the display of numerals. Steps to Reproduce: 1. Open LibreOffice and set the number display to "Context" 2. Open the attachment in LibreOffice. 3. Open MS Word and set the numerals to context in "File > Options > Advanced > Numerals: Context". 4. Open the attachment in MS Word. 5. Compare the display and output. Actual Results: The display of these in LibreOffice are completely different from MS Office. In LibreOffice, sub-direction is lost loading the file. C++ is rendered as ++C, and the 123 is displayed as ۱۲۳, which is incorrect. Expected Results: The original document contains two numbers, one ۱۲۳ and one 123. Also, it contains the text C++. Reproducible: Always User Profile Reset: No Additional Info: Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: 20f57e14362674d321ef184e1987f41a6418adc2 CPU threads: 20; OS: Windows 10.0 Build 22621; UI render: Skia/Vulkan; VCL: win Locale: en-US (en_DE); UI: en-US Calc: CL threaded
Created attachment 189040 [details] PDF output from LibreOffice
Created attachment 189041 [details] PDF output from MS Word
Created attachment 189042 [details] PNG: side by side comparison of the output from LibreOffice and MS Word
This can be hacked by surrounding the text with RLE/LRE and PDF Unicode control characters, but when exporting back we wouldn’t have a way to differentiate between this and user-entered control characters (though probably it does not matter in practice). Otherwise we would need an internal machinery to handle explicit direction of text portion which we currently don’t have AFAIK. This might be a duplicate of bug 156582 and possibly other DOCX-RTL issues that have the same root cause.
Created attachment 189043 [details] TXT: The output of saving the DOCX to a text file "Add bi-directional marks" should be selected during the export. In "File Conversion" dialog, the selected encoding is "Unicode (UTF-8)" in "Other encoding".