Bug 106086 - FILESAVE: 'w:ThemeColor' attribute duplicated after roundtrip
Summary: FILESAVE: 'w:ThemeColor' attribute duplicated after roundtrip
Status: RESOLVED INSUFFICIENTDATA
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.2.5.1 release
Hardware: All All
: high major
Assignee: Not Assigned
URL:
Whiteboard: interoperability
Keywords: dataLoss, filter:docx
Depends on:
Blocks: DOCX-Corrupted OOXML-Doc-Themes
  Show dependency treegraph
 
Reported: 2017-02-19 12:46 UTC by CommodusTheTyrant
Modified: 2018-10-09 09:33 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Contains the original document (DOCX), screenshot of SAX parse error report (JPG), screenshot of where I found the error (JPG), and the corrected document.xml (XML) that now runs (230.23 KB, application/x-7z-compressed)
2017-02-19 12:46 UTC, CommodusTheTyrant
Details
the corrected XML file (27.37 KB, text/xml)
2017-02-20 04:34 UTC, CommodusTheTyrant
Details
Both the corrupt and the fixed DOCX file included here (239.67 KB, application/x-7z-compressed)
2017-02-21 21:11 UTC, CommodusTheTyrant
Details

Note You need to log in before you can comment on or make changes to this bug.
Description CommodusTheTyrant 2017-02-19 12:46:55 UTC
Created attachment 131337 [details]
Contains the original document (DOCX), screenshot of SAX parse error report (JPG), screenshot of where I found the error (JPG), and the corrected document.xml (XML) that now runs

Upon save, Writer messed up the xml code, repeating the 'w:ThemeColor' attribute twice at around char.3119 of line.2 (sorry, I have no idea how one should denote positions in XML).

This caused a SAX parser error, and I had to learn to visually debug and fix an XML file real fast, or my son's teacher would have had to hear the age-old excuse: "sorry my XML converter ate my homework".

I included all the files connected.

Happy hunting, and keep up the good work.
We are very grateful for LibreOffice.

pk
Comment 1 Thomas Woltjer 2017-02-19 16:56:45 UTC
Opening the file creates the same error, but I don't have enough information to reproduce this from the ground up. Was there a specific type of element that you included in the document that caused the exported DOCX file to become corrupted? Or was this a DOCX file written by another office suite that won't open in LO?

Libreoffice 5.3.0.3 on Manjaro Linux, 64-bit.
Comment 2 Thomas Woltjer 2017-02-19 16:57:04 UTC
Updating status to NEEDINFO.
Comment 3 CommodusTheTyrant 2017-02-20 00:07:55 UTC
Hey Thomas, thanks for the speedy reply.

When I wrote that it was Writer that messed up the save, I meant that it was  Libre Office Writer. No external apps involved, no special elements inserted in the document.

If you take the document.xml that I packed and substitute it for the original document.xml in the docx file that gives the error, now it will open properly. The only difference between the two document.xls-es is that I took out one of the two 'w:themeColor="text1"' tags that Writer inserted immediately after each other within the same bracket. Please see the highlighted area in the 'actualerrorIthink.jpg' file.

That unnecessary repeat of the tag is what generated the SAX parser syntax error, and is the bug I am reporting.
Comment 4 Thomas Woltjer 2017-02-20 01:52:32 UTC
There doesn't seem to be any corrected document.xml file in the attached 7z archive. Would you mind uploading it, separately?
Comment 5 CommodusTheTyrant 2017-02-20 04:34:59 UTC
Created attachment 131346 [details]
the corrected XML file

weeird, I could have sworn I added it...here you go, hope it helps
Comment 6 Xisco Faulí 2017-02-21 15:54:00 UTC Comment hidden (obsolete)
Comment 7 CommodusTheTyrant 2017-02-21 21:11:10 UTC
Created attachment 131401 [details]
Both the corrupt and the fixed DOCX file included here
Comment 8 CommodusTheTyrant 2017-02-21 21:13:05 UTC
Ok, now in the 7z file there are in fact both files, the one generating the SAX parse error, and the 'Fixed' one that is OK.

The difference is that I took out one of the two 'w:themeColor="text1"' tags from the document.xml in the file with the 'Fixed' prefix.

I solved our problem, thank God, but the bug I wanted to report, because the FILESAVE messed up here by repeating that tag twice.
Comment 9 Xisco Faulí 2017-02-21 22:42:19 UTC
Hello,
I can't reproduce the problem if I save the 'fixed' document. Could you please attach the original file which gives the SAX parser error after roundtrip?
Comment 10 CommodusTheTyrant 2017-02-21 22:44:32 UTC
Please try to open the original 'economics project.docx' - that one gives the error described, and has the duplicate tag.
Comment 11 Xisco Faulí 2017-02-21 22:53:07 UTC
(In reply to CommodusTheTyrant from comment #10)
> Please try to open the original 'economics project.docx' - that one gives
> the error described, and has the duplicate tag.

That's not the file I meant, we want the original file, the one that gets corrupted after a roundtrip with LibreOffice, thus we can investigate what's wrong at export time.
Comment 12 CommodusTheTyrant 2017-02-21 23:04:39 UTC
ok, so, help me out here - what is a roundtrip?

also - my son typed all morning, and this was his first save.
(the original 'econom...' file)
then he closed writer only to find that it could not reopen.
so, this file is as original as possible.
Comment 13 Buovjaga 2017-02-22 19:25:08 UTC
(In reply to CommodusTheTyrant from comment #12)
> ok, so, help me out here - what is a roundtrip?

https://en.wikipedia.org/wiki/Round-trip_format_conversion
Comment 14 Aron Budea 2017-02-22 20:49:23 UTC
Since no other applications were involved, there was no roundtrip, either.
This will be quite hard to reproduce... a dev might be able to look into the code how it could be possible for the tag 'w:themeColor="text1"' to end up twice in the output.

The issue causes data loss, I'm adjusting keywords and importance.
Comment 15 CommodusTheTyrant 2017-02-22 21:19:39 UTC
thanks for the explanation, and the correction of the keywords

yes, I figured it was gonna be a developer issue, since the dataloss I have been able to fix manually

sorry, I do not have a more original file than the corrupted "economics..." file in the 7z package, since my son saved the new (corrupted) version over the previous version.
Comment 16 Justin L 2018-01-09 16:26:27 UTC
(In reply to CommodusTheTyrant from comment #15)
> sorry, I do not have a more original file ...
Thanks for reporting the problem, but without a sample document that allows us to reproduce the problem, a dev won't have any chance of correcting anything. For all we know, it might have already been fixed, so no dev is going to dig around looking for theoretical possibilities. So I'm going to close this issue.
Comment 17 Aron Budea 2018-01-09 17:02:13 UTC
Might be a duplicate of bug 113790, which would mean it's hopefully fixed.