Created attachment 87653 [details] Original DOCX file created in MS Office LibreOffice Version: 4.1.0.4 (download from LO website) There seem to be difference between document (DOCX) generated using command-line (--convert-to docx:"MS Word 2007 XML") and through GUI (actually opening the document in LibreOffice and doing a "Save As" => Microsoft Word 2007/2010 XML(.docx) ) if the input document is DOCX format created using Microsoft Office. The Table of Contents (TOC) is getting lost if the conversion is using command-line whereas it is preserved if done through GUI. If we compare document.xml after extracting the documents converted through these two formats, a large portion corresponding to Table of Contents is missing. Expectation is that same ooxml import/export filter code path should be executed resulting in identical files. The question was initially posted here: http://ask.libreoffice.org/en/question/23932/difference-between-document-converted-using-cli-convert-to-and-gui-save-as/
Created attachment 87655 [details] Original document converted using "--convert-to docx:"MS Word 2007 XML" filter
Created attachment 87656 [details] Original document converted using "Save As => "MS Word 2007/2010"
Created attachment 87666 [details] DOCX versions saved via CLI/GUI under v3572, v3672, v4052, and v4122. There is something strange going on here. Let's ignore for a moment the fact that the original document is a DOCX and is being converted to DOCX. I used this command to do the CLI conversions: $ soffice --headless --convert-to docx:"MS Word 2007 XML" orig.docx --outdir .. Attached are examples of conversion of the original DOCX via both CLI and GUI using these version of LO: - v3.5.7.2 Build: 3215f89-f603614-ab984f2-7348103-1225a5b / Crunchbang v11 x86_64. - v3.6.7.2 Build: e183d5b / Ubuntu v10.04 x86_64. - v4.0.5.2 Build: 5464147a081647a250913f19c0715bca595af2f / Ubuntu v10.04 x86_64. - v4.1.2.2 Build: 281b75f427729060b6446ddb3777b32f957a8fb / Ubuntu v10.04 x86_64. While the file size variance does tend to indicate differences between the CLI/GUI versions, I have not examined the XML in detail to determine these differences. I have however opened all the resulting files with each of the above versions of LO and viewed the table of contents (ToC). Here are the results: - v3.5.7.2 opens all the files and displays a two-level ToC (blue level 1 headings and black level 2 headings). - v3.6.7.2 opens the v3572, v3672, and v4052 files and displays a two-level ToC (blue level 1 headings and black level 2 headings). Attempts to open either of the v4122 files results in an immediate crash. - v4.0.5.2 opens the v3572 and v3672 files and displays a single-level ToC (blue level 1 headings are missing; black level 2 headings are visible). It also opens the v4052 files but displays no ToC (a single line placeholder field is visible). Attempts to open either of the v4122 files results in an immediate crash ("soffice.bin: double free or corruption" error). - v4.1.2.2 opens the v3572 and v3672 files and displays a single-level ToC (blue level 1 headings are missing; black level 2 headings are visible). It also opens the v4052 and v4122 files but displays no ToC (a single line placeholder field is visible).
Based in my tests in comment #3 I am confirming this bug. Status set to NEW and keyword "regression" added.
I had forgotten about this earlier bug, which details the same problem. I have commented in the earlier bug about the testing / files available here. *** This bug has been marked as a duplicate of bug 67005 ***