Description: In general, MS Word apps provide localized styles for non-Englis languages (e.g. Titre1, Titre2... in French, instead of Heading1, Heading2...) and they are exported verbatim in the document.xml file. However, the styles.xml file should contains a reference to standard "latent" styles so that any application receiving those localized styles can recognized what they actually mean, an can process them correctly. Fixing that aspect would allow non-Englis versions of OpenOffice to "speak" not only to Ms Word, but also to a host of libraries that expect docx documents. Steps to Reproduce: 1. Create the a file with a **non-English version of LibreOffice**, with a heading. 2. Export the file in .docx 3. Give the file to pandoc: pandoc xxxx.docx. The headings will be missed and converted into plain text. Actual Results: In style.xml: <w:style w:type="paragraph" w:styleId="Titre1"> <w:name w:val="Titre 1"/> <w:basedOn w:val="Titre"/> <w:next w:val="Corpsdetexte"/> <w:pPr> <w:numPr> <w:ilvl w:val="0"/> <w:numId w:val="1"/> </w:numPr> <w:spacing w:before="240" w:after="120"/> <w:outlineLvl w:val="0"/> <w:outlineLvl w:val="0"/> </w:pPr> <w:rPr> <w:b/> <w:bCs/> <w:sz w:val="36"/> <w:szCs w:val="36"/> </w:rPr> </w:style> Expected Results: <w:style w:type="paragraph" w:styleId="Titre1"> <w:name w:val="heading 1"/> <w:basedOn w:val="Titre"/> <w:next w:val="Corpsdetexte"/> <w:pPr> <w:numPr> <w:numId w:val="1"/> </w:numPr> <w:outlineLvl w:val="0"/> </w:pPr> <w:rPr> <w:b/> <w:bCs/> <w:sz w:val="36"/> <w:szCs w:val="36"/> </w:rPr> </w:style> Reproducible: Always User Profile Reset: No Additional Info: 1. Word (as an application) is itself tolerant toward that kind of omission and it will spontaneously correct it. 2. In general, feeding a docx file to pandoc is a sure way to make a litmus check. 3. I have check in the release notes to see whether this bug has been fixed in later versions, but I have failed to see any. User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:58.0) Gecko/20100101 Firefox/58.0
@fralau : pandoc is only available via homebrew, not something that most OSX users commonly install on the OSX boxes. What other easily installable tool can one use to test your affirmation ? From your description, it sounds like the problem lies with pandoc, not with LO. If I open my test docx document created by following your description and then open it in Word 16.10 (180210), I see the correct heading.
Tested with Version: 6.0.1.1 Build ID: 60bfb1526849283ce2491346ed2aa51c465abfe6 Threads CPU : 4; OS : Mac OS X 10.13.3; UI Render : par défaut; Locale : fr-FR (fr_FR.UTF-8); Calc: group
I used pandoc to illustrate the problem, but the problem lies in the xmlfiles generated by LO. Any other tool might make the point. The fact that Word actually reacts well to the way to the styles in docx files generated by LO is a facility of Word (retranslating the foreign style names into standard styles). They can do that, in my understanding, thanks to "latent styles", i.e. an underlying model that is not in the docx/xml file. Said otherwise, MS Word has a "fault-tolerant" feature that is not part of the ISO spec of docx files. Unfortunately, other apps that rely on the standard spec of docx files (as they should) will fail. I found this explanation useful: http://python-docx.readthedocs.io/en/latest/user/styles-understanding.html But the underlying issue is HOW one would define a compliant docx file? There is a difference between impunity and legality: "Compliant to ISO/IEC 29500 " or "loads satisfactorily into Ms Word" are NOT interchangeable definitions. In the spirit of OO, the first is safer, as it is relatively stable definition, it has general agreement, and if it changes, everyone will be notified in time. By contrast doing something illegal but with impunity may elicit controversy: indeed this feature of correcting stylesheets in Word is largely undocumented, and they might change or alter it without notice.
I might add another point: while it is of course essential to make sure that Ms Word can read the docx files produced, it is *also* important to meet the specification of Office Open XML concerning styles, so that other open source projects can benefit from a docx file produced by LO. Ignoring that requirement, might be excluding other open source software from the ecosystem of LO, and indirectly favor a closed source software (Ms Word).
OK, so I'm not a developer, merely a volunteer QAer, so how do I go about confirming the problem you experience ? Therein lies the immediate issue, irrespective of its merits. I would add from a personal viewpoint that what you suggest sounds like it implies including extra xml information that is currently not stored in ODF documents - this can only therefore make them even larger still and more verbose when that information gets mapped to docx open xml. Surely, that is something we would wish to avoid for performance reasons (it is bad enough already) ?
@Mikos : any thoughts on this ?
(In reply to Alex Thurgood from comment #6) > @Mikos : any thoughts on this ? @Miklos
Hi, Thanks for reporting the issue. I think it's a dupe of bug 44451. *** This bug has been marked as a duplicate of bug 44451 ***
I don't think it's a duplicate. While there is a rough similarity in functionally (and general question), the other ones have to do with references and tables, while this one is very narrow in its technical scope: styles. The place in the XML file where the bug occurred has been identified and a solution has been proposed. Marking it as duplicate of broad questions would result on the diagnostic information on this bug being lost.
(In reply to Alex Thurgood from comment #5) > I would add from a personal viewpoint that what you suggest sounds like it > implies including extra xml information that is currently not stored in ODF > documents - this can only therefore make them even larger still and more > verbose when that information gets mapped to docx open xml. Surely, that is > something we would wish to avoid for performance reasons (it is bad enough > already) ? This is valid objection in principle, but the additional information required in practice is very minimal (a mere indirection). Instead of saying: " Here is 'Titre 1' ..." You basically have to say: 1. "'Titre 1' => 'Heading 1' 2. "Here is 'Heading 1': ..."
Another possibility, if you don't want to touch the structure of the XML file produced, might be to simply have a conversion table in LO (I guess there already is) and convert all standard styles to their English name. I surmise LO has a similar issue with style names in native mode, so it might have some translation mechanism already implemented? This shouldn't change anything for ordinary users, since Word would automatically "re-localize" their styles upon next opening. I haven't tested whether LO does it as well, but I guess it would.
I changed my UI language to German, and the export from 6.0.2.1 seems to be compliant with your expected results? I extracted the styles.xml from the exported docx and opened in BBEdit and reflowed Can you retest in a current version of LibreOffice and paste the info in About LibreOffice? <w:style w:type="paragraph" w:styleId="Berschrift1"> <w:name w:val="Heading 1" /> <w:basedOn w:val="Berschrift" /> <w:next w:val="Textkrper" /> <w:qFormat /> <w:pPr> <w:numPr> <w:ilvl w:val="0" /> <w:numId w:val="1" /> </w:numPr> <w:spacing w:before="240" w:after="120" /> <w:outlineLvl w:val="0" /> </w:pPr> <w:rPr> <w:b /> <w:bCs /> <w:sz w:val="36" /> <w:szCs w:val="36" /> </w:rPr> </w:style> Version: 6.0.2.1 Build-ID: f7f06a8f319e4b62f9bc5095aa112a65d2f3ac89 CPU-Threads: 2; BS: Mac OS X 10.12.6; UI-Render: Standard; Gebietsschema: en-US (en_US.UTF-8); Calc: group
Dear Bug Submitter, This bug has been in NEEDINFO status with no change for at least 6 months. Please provide the requested information as soon as possible and mark the bug as UNCONFIRMED. Due to regular bug tracker maintenance, if the bug is still in NEEDINFO status with no change in 30 days the QA team will close the bug as INSUFFICIENTDATA due to lack of needed information. For more information about our NEEDINFO policy please read the wiki located here: https://wiki.documentfoundation.org/QA/Bugzilla/Fields/Status/NEEDINFO If you have already provided the requested information, please mark the bug as UNCONFIRMED so that the QA team knows that the bug is ready to be confirmed. Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-NeedInfo-Ping-20181105
I have checked (on 6.1.3.2) and, indeed, the output seems the expected one, with the reference to the canonic style Heading1. This is how Microsoft Word presents it: <w:style w:type="paragraph" w:styleId="Titre1"> <w:name w:val="heading 1"/> <w:basedOn w:val="Normal"/> <w:next w:val="Normal"/> <w:link w:val="Titre1Car"/> <w:uiPriority w:val="9"/> <w:qFormat/> <w:rsid w:val="00D363B2"/> <w:pPr> <w:keepNext/> <w:keepLines/> <w:spacing w:before="240"/> <w:outlineLvl w:val="0"/> </w:pPr> <w:rPr> <w:rFonts w:asciiTheme="majorHAnsi" w:eastAsiaTheme="majorEastAsia" w:hAnsiTheme="majorHAnsi" w:cstheme="majorBidi"/> <w:color w:val="365F91" w:themeColor="accent1" w:themeShade="BF"/> <w:sz w:val="32"/> <w:szCs w:val="32"/> </w:rPr> </w:style> And this is how LibreOffice presents it: <w:style w:type="paragraph" w:styleId="Titre1"> <w:name w:val="Heading 1"/> <w:basedOn w:val="Normal"/> <w:next w:val="Normal"/> <w:link w:val="Titre1Car"/> <w:uiPriority w:val="9"/> <w:qFormat/> <w:rsid w:val="00d363b2"/> <w:pPr> <w:keepNext w:val="true"/> <w:keepLines/> <w:spacing w:before="240" w:after="0"/> <w:outlineLvl w:val="0"/> </w:pPr> <w:rPr> <w:rFonts w:ascii="Calibri" w:hAnsi="Calibri" w:eastAsia="MS ゴシック" w:cs="" w:asciiTheme="majorHAnsi" w:cstheme="majorBidi" w:eastAsiaTheme="majorEastAsia" w:hAnsiTheme="majorHAnsi"/> <w:color w:val="365F91" w:themeColor="accent1" w:themeShade="bf"/> <w:sz w:val="32"/> <w:szCs w:val="32"/> </w:rPr> </w:style> It's pretty much identical! This should make it possible for pandoc to process it correctly... And yet pandoc finds a difference between the 2. It compiles correctly the docx produced by Word: $ pandoc simple.docx <h1 id="this-is-the-title-1">This is the title 1</h1> <h2 id="this-is-the-title-2">This is the title 2</h2> <p>Hello this is the paragraph hello. …</p> while for the same docx produced by LibreOffice: $ pandoc simple.docx <p>This is the title 1</p> <p>This is the title 2</p> <p>Hello this is the paragraph hello.</p> Admittedly, passing the pandoc compilation might not be the in the spec of LibreOffice Write. But, as Data would say, "This is intriguing."
(In reply to fralau from comment #14) > I have checked (on 6.1.3.2) and, indeed, the output seems the expected one, > with the reference to the canonic style Heading1. Does it mean, that we can close this bug? If not, what problem remains? => NEEDINFO
I have made another check to this issue by comparing the styles.xml for the same document generated by MS Word and LibreOffice and I may have found what is wrong! The problem is not with document.xml. The predefined styles are defined, in this extract from the styles.xml generated by MS Word, with lower case: <w:latentStyles w:defLockedState="0" w:defUIPriority="99" w:defSemiHidden="0" w:defUnhideWhenUsed="0" w:defQFormat="0" w:count="375"> <w:lsdException w:name="Normal" w:uiPriority="0" w:qFormat="1"/> <w:lsdException w:name="heading 1" w:uiPriority="9" w:qFormat="1"/> <w:lsdException w:name="heading 2" w:semiHidden="1" w:uiPriority="9" w:unhideWhenUsed="1" w:qFormat="1"/> ... Whereas LibreOffice used capitals: <w:style w:type="paragraph" w:styleId="Titre1"> <w:name w:val="Heading 1"/> <w:basedOn w:val="Titre"/> <w:next w:val="Corpsdetexte"/> <w:qFormat/> ... Is that the cause of the issue? Just to verify this hypothesis, I manually changed 'Heading 1' and 'Heading 2' into 'heading 1' and 'heading 2' in styles.xml, regenerated the docx file (by zipping, etc.), and then ran that document through pandoc. And it worked, pandoc recognized the standard headings! Conclusion: in order to make the styles.xml file really standard, the w:val attribute of the w:name tag should use lowercase, e.g.: <w:name w:val="heading 1"/> It seems that would fix the issue.
Hallo Fralau, a new major release of LibreOffice is available since this bug was reported. Could you please try to reproduce it with the latest version of LibreOffice from https://www.libreoffice.org/download/libreoffice-fresh/ ?I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' if the bug is still present in the latest versiona
Ok, so based on comment #14 and comment #16 this is fixed. The remaining issue seems to be that pandoc doesn't recognize the "Heading 1" style as equivalent to "heading 1" Briefly googling, the reference spec seems to use "Heading 1" as an example To me, this seems like a bug in how pandoc parses the style names. Word has picked up my custom formats for Heading 1 correctly and doesn't show a duplicate default Setting as resolved works for me as it was a bug in the LO behavior, but this new issue seems to be a parsing bug in pandoc Fraulau, I would submit the bug to pandoc Version: 7.0.0.0.alpha0+ Build ID: 0cb4f304abf6f8dd6b40eb800788d2fe80581813 CPU threads: 4; OS: Mac OS X 10.14.6; UI render: default; VCL: osx; Locale: en-US (en_US.UTF-8); UI-Language: en-US Calc: threaded