Bug 113790 - 'SAXParseException: Attribute w:cstheme redefined' error upon certain changes to a DOCX
Summary: 'SAXParseException: Attribute w:cstheme redefined' error upon certain changes...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
4.3.0.4 release
Hardware: All All
: medium critical
Assignee: Mike Kaganski
URL:
Whiteboard: target:6.0.0 target:5.4.4
Keywords: bibisected, regression
: 92731 96878 102929 118154 (view as bug list)
Depends on:
Blocks: DOCX-SAXParse
  Show dependency treegraph
 
Reported: 2017-11-12 19:04 UTC by Aron Budea
Modified: 2018-06-25 04:58 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample DOCX (5.03 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-11-12 19:04 UTC, Aron Budea
Details
Sample DOCX after RT (corrupted) (5.24 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-11-12 19:05 UTC, Aron Budea
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aron Budea 2017-11-12 19:04:36 UTC
Created attachment 137703 [details]
Sample DOCX

I received a file that gave the same error message upon opening as bug 102929 (that error is also mentioned in bug 92731):
'SAXParseException: [word/document.xml line 2]: Attribute w:cstheme redefined.'

The erroneous element looked like this:
<w:rFonts w:cs="Calibri" w:cstheme="minorHAnsi" w:ascii="Calibri" w:hAnsi="Calibri" w:cstheme="minorHAnsi"/>

After removing the duplicate attribute from the file manually, the document could be opened. I copied the relevant part of the text to a new document, simplified it, and managed to get the followiing repro case:
- Open the attached sample document.
- Copy the bulleted entry "ABCD", and paste it somewhere else, eg. above "Title3".
- Save as a different DOCX, and reopen it.

=> You get the error/exception mentioned above.

Reproduced using LO 6.0 daily build (2017-11-06_23:18:19, a5af0fd9f27af42cf2e8571f659cdad6e606215b), 5.4.3.2 / Windows 7.

Note that the sample file, as well as the fixed original contains a couple of instances of the following OOXML validation error (no idea if it's related, Word and Writer opens the documents fine):
NumberingSymbolRunProperties	/word/numbering.xml	/w:numbering[1]/w:abstractNum[1]/w:lvl[1]/w:rPr[1]	RunFonts	The element has unexpected child element
Comment 1 Aron Budea 2017-11-12 19:05:15 UTC
Created attachment 137704 [details]
Sample DOCX after RT (corrupted)
Comment 2 MM 2017-11-12 22:09:17 UTC
Confirmed with Version: 5.1.6.2
Build ID: 07ac168c60a517dba0f0d7bc7540f5afa45f0909
CPU Threads: 2; OS Version: Linux 4.4; UI Render: default; 
Locale: en-US (en_US.UTF-8); Calc: single

Confirmed with Version: 6.0.0.0.alpha1+
Build ID: 4c656c82ccdaa47cf447dfff4147b339b44ea8c1
CPU threads: 2; OS: Linux 4.4; UI render: default; VCL: gtk2; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2017-11-11_22:18:01
Locale: en-US (en_US.UTF-8); Calc: single

Technically speaking, bug 102929 is about a hang, when opening a corrupt file. With the new mechanism that allows you to continue to open the file, -that- issue is fixed.
Comment 3 MM 2017-11-12 22:26:37 UTC
Unconfirmed with Version: 4.3.7.2 under mint 17.3 x64.
Confirmed with Version: 5.0.6.3 under mint 17.3 x64.

With Version: 4.4.7.2 it already seems partially broken, as on importing [bullet] ABCD & Title3 are gone already.
Comment 4 Aron Budea 2017-11-12 23:57:57 UTC
I haven't tried with version 4.3.7.2, but the saved file is already truncated in 4.3.0.4 when reopened (it's basically the same issue, just back then the parser silently stopped after similar errors, the error message was added afterwards), however with 4.2.0.4 saving and reopening works fine.
Comment 5 Aron Budea 2017-11-13 05:15:33 UTC
This is strange... Hard to believe this commit would cause that bug. I did verify it by checking out both this and the preceding commit, though. Bibisected using repo bibisect-43max.

Maybe someone could verify if this is indeed the first commit where things stop working, i.e. the end of the text is truncated (the commit in the repo is 3019488043c54b0e4fe2c91ad1c56e50e81d29cd).

https://cgit.freedesktop.org/libreoffice/core/commit/?id=c2d5b59fc6a3b3fbe20a19282538d5f95fa53301
author		Tomaž Vajngerl <tomaz.vajngerl@collabora.com>	2014-04-24 16:39:27 (GMT)
committer	Tomaž Vajngerl <tomaz.vajngerl@collabora.com>	2014-04-24 20:51:15 (GMT)

fdo#77089 pass shape dimensions to graphicfilter for WMF
Comment 6 Aron Budea 2017-11-13 23:09:51 UTC
So, apparently this is bibisectable using repo bibisect_win_44, which results in the following range:

https://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=fe2b8ef18b11b226fddd1cf3fc7f9133426a1b1a..9de3fd2da6d77da6a7abc105712696f183bf6bc3

Maybe time to revisit this using bibisect-44max in Linux?
Comment 7 Aron Budea 2017-11-14 01:17:06 UTC
No luck with bibisect-44max, the result ended up being a commit named "fix tests", only concerning unit tests (from July 1, so relatively close, though). The results are completely unstable...
Comment 8 Mike Kaganski 2017-11-14 10:14:22 UTC
A fix in gerrit: https://gerrit.libreoffice.org/44706
Comment 9 Aron Budea 2017-11-14 11:35:31 UTC
Sounds great, thanks Mike!
Let me change status to assigned, then.
Comment 10 Commit Notification 2017-11-14 15:42:21 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=e128d83b5e7fd2ceb8d5ec9a346a3b7351be79cc

tdf#113790: skip charfmt grabbag items existing in autofmt grabbag

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Commit Notification 2017-11-21 13:13:28 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "libreoffice-5-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=01632a5ee892ebd2218ad8729738672e02d94697&h=libreoffice-5-4

tdf#113790: skip charfmt grabbag items existing in autofmt grabbag

It will be available in 5.4.4.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Aron Budea 2018-01-09 17:06:05 UTC
*** Bug 102929 has been marked as a duplicate of this bug. ***
Comment 13 Aron Budea 2018-01-09 17:08:33 UTC
*** Bug 96878 has been marked as a duplicate of this bug. ***
Comment 14 Aron Budea 2018-01-09 17:10:59 UTC
*** Bug 92731 has been marked as a duplicate of this bug. ***
Comment 15 Aron Budea 2018-06-14 09:13:37 UTC
*** Bug 118154 has been marked as a duplicate of this bug. ***