Created attachment 53987 [details] ZIP file contains a PDF explaining the issue with screenshots from LibreOffice and Word and the sample document used. Problem description: The attached file contains a PDF with the details of the issue and a sample document that exhibits can be used to see the issue. The document shows as RIGHT justified/aligned in LibreOffice when it is LEFT justified/aligned in Word 2007/2010. Steps to reproduce: 1. Open the Test_document.docx included in the ZIP file 2. Notice that the text in the document is RIGHT justified when it should be LEFT justified (as in Word) Current behavior: Word 2007/2010 docx document in LibreOffice has improper justification/alignment (LEFT) when opened Expected behavior: Should appear properly RIGHT justified in LibreOffice Platform (if different from the browser): Ubuntu OS LibreOffice 3.4.4 (release)
The current and expected behavior should actually be: Current behavior: Word 2007/2010 docx document in LibreOffice has improper justification/alignment (RIGHT) when opened Expected behavior: Should appear properly LEFT justified in LibreOffice
Reproducible both using -- LibreOffice 3.4.4 with German langpack -- LibO-dev 3.5.0 Build ID: 549f928-b211287-090bcba-45cf606 build date 2011-12-01 -- both running on MacOS X 10.6.8 I may be totally wrong, but may this be a right-to-left-text issue, i.e. is it possible that LibreOffice thinks that the sample text is Latin (left-to-right) text inside a paragraph of right-to-left text (like used for Hebrew and other languages)? When I click in the sample paragraph, LibreOffice shows a special curser which may be a hint to this. Could someone who knows about the internals of DOCX format take a look at the DOCX file? I don’t really understand Microsoft’s over-complicated file format, but some of the paragraph/span tag names look (to me) suspicious. This is to say that I propose to check the document first, maybe it’s the source of the strange behaviour. I may be totally wrong, of course ... @ smaybo@labarchives.com : Which application and version was used to create this DOCX file? MS Office 2007? 2010?
-- Changed platform from Linux to All -- at least, the problem is also present on MacOS X. Could someone check Windows? -- Changed importance from high/blocker to high/critical: there is no release this bug could block, LibreOffice 3.4.4 was already released ;-) Also, there is no crash or loss of data; so IMHO we should lower the importance even more, but someone else with more experience will better judge about the best setting for importance. No offence, just trying to help ...
Ups, I forgot to change the status from 'Unconfirmed' to 'New' ...
Yes! If you activate the LibreOffice option "Options > Language Settings > Languages > Enhanched language support > Enabled for complex text layout (CTL)", and then open the sample document, click somewhere in the first (and only paragraph) and select "Format > Paragraph ... > Alignment", you see that at the bottom of the dialog window the setting selected for "Text direction" is "Right-to-left". This confirms my idea that LibreOffice thinks this document contains left-to-right (English, Latin script) text inside a right-to-left paragraph. What we now need to investigate is: WHY does LibreOffice this paragraph as right-to-left text? Is there an issue with the DOCX file itself, or is there a bug in the DOCX importer?
Thanks for your confirmations and debugging. The DOCX document was created by Zoho. Since Microsoft Word (and other apps that support Office document formats like ThinkFree) reads the DOCX in fine, it seems to be something with the DOCX importer in LibreOffice.
Looking at the contents of the DOCX file, in /word/document.xml the first paragraph starts with <w:p><w:pPr><w:bidi w:val="0" /> ... I may be totally wrong, but this <w:bidi ... /> tag could be the source of LibreOffice’s confusion. There is no similar <w:bidi ... /> tag in many other DOCX files I have seen (there are only bidi="" attributes in <w:lang ... /> tags). This does not mean that the <w:bidi ... /> tag is wrong (someone else should tell us about that), it is just not common for western-language DOCX files. Maybe LibreOffice concludes from the mere presence of the <w:bidi ... /> tag that the paragraph contains text in a right-to-left language (like Hebrew), instead of looking at the w:val="0" attribute which (having the value of "0") clearly says that this paragraph does NOT contain bi-directional text?! There have been issues with the <w:bidi ... /> tag in OpenOffice before, see e.g. https://issues.apache.org/ooo/show_bug.cgi?id=111714 Maybe the fix in CWS writerfilter08 (2010-05-20) was a bit to general, i.e. did not take the w:val="0" attribute into account?
Looking at the LibreOffice source code is difficult and maybe megalomaniac for me, as I have never been involved in a programming project of this size and complexity (I just have done some little tools for my personal use). But I can’t resist. Well: In writerfilter/source/ooxml/model.xml (line 22020), the paragraph element "bidi" gets the tokenid "sprm:PFBiDi". So we have to look for "PFBiDi". But the function DomainMapper::sprmWithProps() in writerfilter/source/dmapper/DomainMapper.cxx handels the case NS_sprm::LN_PFBiDi very simply (lines 1830-1834): rContext->Insert(PROP_WRITING_MODE, false, uno::makeAny(text::WritingMode2::RL_TB )); rContext->Insert(PROP_PARA_ADJUST, false, uno::makeAny( style::ParagraphAdjust_RIGHT )); without (as it appears to me) looking at the attributes of the bidi element (no attempt to look at nIntValue or sStringValue in lines 1830-1834). So, either the attribute of the <w:bidi .../> tag is handled elsewhere, or it is not handled at all. This would easily explain the error reading this DOCX file. Please forgive me if I’m totally wrong ...
Created attachment 56464 [details] Proposed patch. Proposed patch attached. Also sent a patch review email to the development list.
verified.
fixed by Muhammad Haggag's patch, marking as fixed
My first bug comment here, please don't kill me. In LibreOffice 3.5.0rc3 Build ID: 7e68ba2-a744ebf-1f241b7-c506db1-7d53735 the same problem occurs: Text appears left-aligned, left-to-right in Office 2007, but in LibreOffice, the text is right-aligned and right-to-left. The suggested workaround as follows "open the sample document, click somewhere in the first (and only paragraph) and select "Format > Paragraph ... > Alignment", you see that at the bottom of the dialog window the setting selected for "Text direction" is "Right-to-left"." reveals that LibreOffice thinks the text is Arabic. Changing alignment to Left-to-Right in the dialogue window fixes the issue.
I can confirm that the bug is still present in LibreOffice 3.5.1.2 Build-ID: dc9775d-05ecbee-0851ad3-1586698-727bf66 It sees that the bug has been fixed on Master, but the fix has not (yet) been backported to the 3.5 branch. If this is right, then please consider backporting the fix. (But if it is not right, i.e. if the fix is already present in the 3.5 branch, the fix is not sufficient, obviously ...).
attached patch was pushed as d8cb61f5f32247a8bbaf89fb910c015b6107f051, will push it to libreoffice-3-5 as well; thanks for the patch!
Muhammad Haggag committed a patch related to this issue. It has been pushed to "libreoffice-3-5": http://cgit.freedesktop.org/libreoffice/core/commit/?id=0fe856fdbf7696b00ed3fe32890cdb0f7e46c9a5&g=libreoffice-3-5 fdo#43398: dmapper: Switch paragraphs to RTL based on the value of w:BiDi. It will be available in LibreOffice 3.5.6.