Bug 99227 - DOCX roundtrip: SAXParseException: Extra content at the end of the document (added w:drawing)
Summary: DOCX roundtrip: SAXParseException: Extra content at the end of the document (...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
5.0.4.1 rc
Hardware: All All
: medium major
Assignee: Mike Kaganski
QA Contact:
URL:
Whiteboard: target:5.4.0
Keywords: bibisected, bisected, filter:docx, regression
Depends on:
Blocks: SAXParse
  Show dependency treegraph
 
Reported: 2016-04-11 17:19 UTC by ffrere
Modified: 2017-05-22 19:19 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
screenshot (22.67 KB, image/jpeg)
2016-04-11 17:19 UTC, ffrere
Details
word file.docx impossible to save in docx (310.42 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2016-04-12 12:48 UTC, ffrere
Details
debug from procdump dump (14.38 KB, text/plain)
2016-12-19 13:51 UTC, Timur
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ffrere 2016-04-11 17:19:52 UTC
Created attachment 124257 [details]
screenshot
Comment 1 raal 2016-04-11 19:37:25 UTC
Hello,

Thank you for filing the bug. Please send us a sample document, as this makes it easier for us to verify the bug. 
I have set the bug's status to 'NEEDINFO', so please do change it back to 'UNCONFIRMED' once you have attached a document.
(Please note that the attachment will be public, remove any sensitive information before attaching it.)
How can I eliminate confidential data from a sample document?
https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F
Thank you
Comment 2 ffrere 2016-04-12 12:48:12 UTC
Created attachment 124282 [details]
word file.docx impossible to save in docx
Comment 3 raal 2016-04-12 14:41:01 UTC
I can confirm with Version: 5.2.0.0.alpha0+
Build ID: ef34535ceb60d7d63b8d8671e4c6e9e43ffbd17d
CPU Threads: 4; OS Version: Linux 4.2; UI Render: default; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2016-03-27_09:53:05

after resave, reoopen, I get error message: File format error found at 
SAXParseException: '[word/document.xml line 2]: Extra content at the end of the document
', Stream 'word/document.xml', Line 2, Column 1092208(row,col).

regression, works in LibreOffice 3.5.0
Comment 4 raal 2016-04-13 13:03:16 UTC
bibisected: in 5.0-latest error message when openning original document:
File format error found at unsatisfied query for interface of type com.sun.star.lang.XComponent!
SAXParseException: '[word/footnotes.xml line 2]: unknown error', Stream 'word/footnotes.xml', Line 2, Column 52638
SAXParseException: '[word/document.xml line 2]: unknown error', Stream 'word/document.xml', Line 2, Column 98790(row,col).

I marked it as "bad":
07335c0a11142b308d6d73dd6a7176e52e5a483f is the first bad commit
commit 07335c0a11142b308d6d73dd6a7176e52e5a483f
Author: Norbert Thiebaud <nthiebaud@gmail.com>
Date:   Wed May 20 02:03:48 2015 -0500

    source sha:ebf767eeb2a169ba533e1b2ffccf16f41d95df35

    source sha:ebf767eeb2a169ba533e1b2ffccf16f41d95df35
	
	author	Michael Stahl <mstahl@redhat.com>	2015-01-22 11:50:07 (GMT)
committer	Michael Stahl <mstahl@redhat.com>	2015-01-22 12:58:10 (GMT)
commit	ebf767eeb2a169ba533e1b2ffccf16f41d95df35 (patch)
tree	81946ae1fdee22ebf6f2543d0a965304528205f7
parent	825e4995220209362c13ed5f07c98e43a5f456de (diff)
writerfilter: DOCX import: better error handling than "catch (...) {}"


If marked it as "good":  every steps is with this error message.

5.1 bibisect: err message marked as good:

a461a6f2b26bc9d8f1ca37c41f8b30fbd9a5f9ca is the first bad commit
commit a461a6f2b26bc9d8f1ca37c41f8b30fbd9a5f9ca
Author: Norbert Thiebaud <nthiebaud@gmail.com>
Date:   Sat Nov 14 09:34:22 2015 -0800

    source sha:f47bd0561cdf4c2b4fbe2c7e396533cf85408cb7

    source sha:f47bd0561cdf4c2b4fbe2c7e396533cf85408cb7

author	Oliver Specht <oliver.specht@cib.de>	2015-11-13 11:36:04 (GMT)
committer	Oliver Specht <oliver.specht@cib.de>	2015-11-13 13:22:07 (GMT)
commit	f47bd0561cdf4c2b4fbe2c7e396533cf85408cb7 (patch)
tree	d0295ce8dee7ae9ef3f08a8e7fcace909a1a15d9
parent	f73284fb864699716b3a52faf2ad39bc8e48c3cc (diff)
tdf#95188: enable import of shapes in footnotes in .docx
substreams require a Model and a DrawPage
Comment 5 Timur 2016-09-24 10:21:54 UTC
LO 5.0 up 5.0.3 couldn't open this .docx. LO 5.0.4 and later opens it, with this save problem. 
I'm not sure this can be treated as a regression. Anyway, I hope Mike will look into this.
Note: LO 5.2 and master open it rather slow, slower than 3.5.
Comment 6 Xisco Faulí 2016-09-26 15:21:17 UTC
Adding Cc: to Michael Stahl
Comment 7 Xisco Faulí 2016-11-27 15:24:54 UTC
*** Bug 101287 has been marked as a duplicate of this bug. ***
Comment 8 Xisco Faulí 2016-12-05 15:14:57 UTC
*** Bug 104414 has been marked as a duplicate of this bug. ***
Comment 9 Xisco Faulí 2016-12-06 12:50:09 UTC Comment hidden (obsolete)
Comment 10 Mike Kaganski 2016-12-11 13:41:02 UTC
An output filter issue.

The original document does not contain any w:drawing elements in its document.xml. When saved using LibreOffice, four w:drawing elements are added past w:document, which indeed produces ill-formed XML.
Comment 11 Mike Kaganski 2016-12-11 13:52:13 UTC
The original's footnotes.xml does contain 4 w:drawing elements that are absent in resulting footnotes.xml after save.

I suspect that Oliver's patch (commit f47bd0561cdf4c2b4fbe2c7e396533cf85408cb7 identified by raal) had enabled import of drawings from the footnotes, but didn't take care of saving them back. So they go to wrong stream,and in wrong place.
Comment 12 Mike Kaganski 2016-12-11 13:58:03 UTC
Also I see a number of bugs marked as duplicates of this one based on the fact that their error messages all started to appear after Michael Stahl's commit better error handling than "catch (...) {}". That's not correct.
Michael's patch made possible to see quite a number of different problems that all were simply ignored in previous versions. So, the diagnostic is not the cause of all those; it just revealed them.

Thus, I recommend to revert setting them as duplicates, unless it is proved so by a patch to one of them that fixes another as well.
Comment 13 Mike Kaganski 2016-12-11 18:26:32 UTC
A patch submitted for review: https://gerrit.libreoffice.org/31871
Comment 14 Commit Notification 2016-12-12 10:06:47 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=dd747c0669f6e31462c39fe104d2f2c0acc4de0a

tdf#99227: use correct serializer when exporting drawing

It will be available in 5.4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 Commit Notification 2016-12-16 09:07:52 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=644bddc80a6c4d3bec2f32d6dcbbc0f450582511

tdf#99227 follow-up: synchronize setting serializers

It will be available in 5.4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Timur 2016-12-19 13:11:22 UTC
I guess saving is fixed. 

Note: 
1. Open XML validation for attachment 124282 [details] gives an error for "w:setings". 
2. Attachment 124282 [details] is slow to open in MSO and even more slow to open in LO. It also creates dump here.
Comment 17 Timur 2016-12-19 13:51:23 UTC
Created attachment 129777 [details]
debug from procdump dump

Fileopen dump should be another bug, but I'm not sure whether this debug is useful or it's some memory stuff.
Comment 18 Commit Notification 2017-05-22 19:19:45 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a9118665637dafddd41ca549d0f73948cf1d332c

tdf#99227: remove unneeded specificity in unit test

It will be available in 5.4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.