Bug 70481 - Difference between DOCX document converted using CLI (--convert-to) and GUI (Save As)
Summary: Difference between DOCX document converted using CLI (--convert-to) and GUI (...
Status: RESOLVED DUPLICATE of bug 67005
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
4.1.0.4 release
Hardware: x86-64 (AMD64) Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: regression
Depends on: 65918
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-15 08:46 UTC by RKohad
Modified: 2023-08-06 22:13 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Original DOCX file created in MS Office (362.63 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2013-10-15 08:46 UTC, RKohad
Details
Original document converted using "--convert-to docx:"MS Word 2007 XML" filter (245.85 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2013-10-15 08:49 UTC, RKohad
Details
Original document converted using "Save As => "MS Word 2007/2010" (250.06 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2013-10-15 08:49 UTC, RKohad
Details
DOCX versions saved via CLI/GUI under v3572, v3672, v4052, and v4122. (1.91 MB, application/zip)
2013-10-15 11:29 UTC, Owen Genat (retired)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description RKohad 2013-10-15 08:46:23 UTC
Created attachment 87653 [details]
Original DOCX file created in MS Office

LibreOffice Version: 4.1.0.4 (download from LO website)

There seem to be difference between document (DOCX) generated using command-line (--convert-to docx:"MS Word 2007 XML") and through GUI (actually opening the document in LibreOffice and doing a "Save As" => Microsoft Word 2007/2010 XML(.docx) ) if the input document is DOCX format created using Microsoft Office.

The Table of Contents (TOC) is getting lost if the conversion is using command-line whereas it is preserved if done through GUI. If we compare document.xml after extracting the documents converted through these two formats, a large portion corresponding to Table of Contents is missing.

Expectation is that same ooxml import/export filter code path should be executed resulting in identical files.

The question was initially posted here: http://ask.libreoffice.org/en/question/23932/difference-between-document-converted-using-cli-convert-to-and-gui-save-as/
Comment 1 RKohad 2013-10-15 08:49:11 UTC
Created attachment 87655 [details]
Original document converted using "--convert-to docx:"MS Word 2007 XML" filter
Comment 2 RKohad 2013-10-15 08:49:56 UTC
Created attachment 87656 [details]
Original document converted using "Save As => "MS Word 2007/2010"
Comment 3 Owen Genat (retired) 2013-10-15 11:29:04 UTC
Created attachment 87666 [details]
DOCX versions saved via CLI/GUI under v3572, v3672, v4052, and v4122.

There is something strange going on here. Let's ignore for a moment the fact that the original document is a DOCX and is being converted to DOCX. I used this command to do the CLI conversions:

$ soffice --headless --convert-to docx:"MS Word 2007 XML" orig.docx --outdir ..

Attached are examples of conversion of the original DOCX via both CLI and GUI using these version of LO:

- v3.5.7.2 Build: 3215f89-f603614-ab984f2-7348103-1225a5b / Crunchbang v11 x86_64.
- v3.6.7.2 Build: e183d5b / Ubuntu v10.04 x86_64.
- v4.0.5.2 Build: 5464147a081647a250913f19c0715bca595af2f / Ubuntu v10.04 x86_64.
- v4.1.2.2 Build: 281b75f427729060b6446ddb3777b32f957a8fb / Ubuntu v10.04 x86_64.

While the file size variance does tend to indicate differences between the CLI/GUI versions, I have not examined the XML in detail to determine these differences. I have however opened all the resulting files with each of the above versions of LO and viewed the table of contents (ToC). Here are the results:

- v3.5.7.2 opens all the files and displays a two-level ToC (blue level 1 headings and black level 2 headings).
- v3.6.7.2 opens the v3572, v3672, and v4052 files and displays a two-level ToC (blue level 1 headings and black level 2 headings). Attempts to open either of the v4122 files results in an immediate crash.
- v4.0.5.2 opens the v3572 and v3672 files and displays a single-level ToC (blue level 1 headings are missing; black level 2 headings are visible). It also opens the v4052 files but displays no ToC (a single line placeholder field is visible). Attempts to open either of the v4122 files results in an immediate crash ("soffice.bin: double free or corruption" error).
- v4.1.2.2 opens the v3572 and v3672 files and displays a single-level ToC (blue level 1 headings are missing; black level 2 headings are visible). It also opens the v4052 and v4122 files but displays no ToC (a single line placeholder field is visible).
Comment 4 Owen Genat (retired) 2013-10-15 11:30:31 UTC
Based in my tests in comment #3 I am confirming this bug. Status set to NEW and keyword "regression" added.
Comment 5 Owen Genat (retired) 2013-10-19 04:48:50 UTC
I had forgotten about this earlier bug, which details the same problem. I have commented in the earlier bug about the testing / files available here.

*** This bug has been marked as a duplicate of bug 67005 ***