Bug 107969 - FILESAVE: 'SAXParseException: '[word/document.xml line 2]: Extra content at the end of the document' error after roundtrip
Summary: FILESAVE: 'SAXParseException: '[word/document.xml line 2]: Extra content at t...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.2.0.0.alpha1
Hardware: All All
: medium normal
Assignee: Mike Kaganski
URL:
Whiteboard: target:6.2.0 target:6.1.0.1
Keywords:
: 115556 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-05-20 09:06 UTC by Ooker
Modified: 2019-05-28 13:04 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
BEFORE saved - non-bugged file (230.69 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-05-20 09:08 UTC, Ooker
Details
AFTER saved - bugged file (164.97 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-05-20 09:08 UTC, Ooker
Details
first 9 pages of the file, comments and revisions removed , non-reproducible (51.25 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-06-16 19:23 UTC, himajin100000
Details
first 10 pages of the file, comments and revisions removed , reproducible (54.69 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-06-16 19:24 UTC, himajin100000
Details
10th page - reproducible (30.59 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-06-17 01:08 UTC, himajin100000
Details
even smaller reproducible document (27.95 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-06-17 01:24 UTC, himajin100000
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ooker 2017-05-20 09:06:25 UTC
Description:
After a save, the file cannot be opened with this error:

File format error found at 
SAXParseException: '[word/document.xml line 2]: Extra content at the end of the document
', Stream 'word/document.xml', Line 2, Column 1426642(row,col).

Build ID: 7074905676c47b82bbcfbea1aeefc84afe1c50e1

Steps to Reproduce:
1. Make an edit
2. Save it
3. Surprise!

Actual Results:  
File format error found at 
SAXParseException: '[word/document.xml line 2]: Extra content at the end of the document', Stream 'word/document.xml', Line 2, Column 1426642(row,col).

Expected Results:
Open normally


Reproducible: Always

User Profile Reset: No

Additional Info:
Could be related to this bug, in which the bug is caused by the footnote
https://bugs.documentfoundation.org/show_bug.cgi?id=104181


User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:53.0) Gecko/20100101 Firefox/53.0
Comment 1 Ooker 2017-05-20 09:08:20 UTC
Created attachment 133412 [details]
BEFORE saved - non-bugged file
Comment 2 Ooker 2017-05-20 09:08:56 UTC
Created attachment 133413 [details]
AFTER saved - bugged file
Comment 3 Xisco Faulí 2017-05-20 10:04:41 UTC
Confirmed in

Version: 5.4.0.0.alpha1+
Build ID: 74d2e606fd3605fe0a585f596eaa215ae4e20d18
CPU Threads: 4; OS Version: Linux 4.8; UI Render: default; VCL: gtk3; 
Locale: en-US (ca_ES.UTF-8); Calc: group

and

Version: 5.2.0.0.alpha1+
Build ID: 5b168b3fa568e48e795234dc5fa454bf24c9805e
CPU Threads: 4; OS Version: Linux 4.8; UI Render: default; 
Locale: ca-ES (ca_ES.UTF-8)

@Mike, one for you?
Comment 4 QA Administrators 2018-05-21 02:35:40 UTC Comment hidden (obsolete)
Comment 5 himajin100000 2018-06-16 19:23:05 UTC
Created attachment 142806 [details]
first 9 pages of the file, comments and revisions removed , non-reproducible
Comment 6 himajin100000 2018-06-16 19:24:05 UTC
Created attachment 142807 [details]
first 10 pages of the file, comments and revisions removed , reproducible
Comment 7 himajin100000 2018-06-16 19:30:36 UTC
Can someone test this bug with the two files I attached?

I opened the files with my local build of LibreOffice 6.2, and just add one character 'a' before "TWITTER" and saved in docx.

This bug was reproducible with 10-page file at least on my environment.

when I extracted the re-saved file for the 10-page case, opened the word/document.xml with a text editor. then I searched for the word "/w:document" in the xml, and found lots of 

<v:shape id="shape_0" ID="ole_rId24" fillcolor="white" stroked="f" style="position:absolute;margin-left:0pt;margin-top:-6.05pt;width:0pt;height:0pt;mso-position-vertical:top"><w10:wrap type="none"/><v:fill o:detectmouseclick="t" type="solid" color2="black"/><v:stroke color="#3465a4" joinstyle="round" endcap="flat"/></v:shape>

after the end tag </w:document>
Comment 8 himajin100000 2018-06-16 19:33:17 UTC
minor typos:
just add => just added
when I extracted => I extracted
opened => and opened
Comment 9 himajin100000 2018-06-17 01:08:19 UTC
Created attachment 142810 [details]
10th page - reproducible

it seems reproducible if we have the 10th page only.
Comment 10 himajin100000 2018-06-17 01:24:20 UTC
Created attachment 142811 [details]
even smaller reproducible document
Comment 11 Mike Kaganski 2018-06-18 01:04:36 UTC
The problem seems related to the shapes that are parts of deleted content in the tracked changes.

The problem is reproducible (only tested using attachment 142811 [details]) with Version: 6.0.5.1 (x64)
Build ID: 0588a1cb9a40c4a6a029e1d442a2b9767d612751
CPU threads: 4; OS: Windows 10.0; UI render: default; 
Locale: ru-RU (ru_RU); Calc: CL:

1. Simply re-saving the file (as DOCX) without any changes;
2. Saving after accepting all tracked changes (!).

In the latter case, the drawing object ole_rId24 disappears from Drawing Objects in Navigator, but after save, it's still past the XML's </w:document>.

Also reproducible using master Version: 6.2.0.0.alpha0+ (x64)
Build ID: b1740fba0d1e6e3d69c3781734509317f42a0e4f
CPU threads: 4; OS: Windows 10.0; UI render: default; 
Locale: ru-RU (ru_RU); Calc: CL

that has mst's redline refactor.

Michael, Miklos: do you have a pointer from the top of your heads where to look?
Comment 12 Mike Kaganski 2018-06-18 01:41:18 UTC
Also should have mentioned that the shape is in a footnote; so it should have been in footnotes.xml, not in document.xml. Adding bug 99227 to "See also", where I fixed a similar issue.
Comment 13 Mike Kaganski 2018-06-18 04:17:59 UTC
https://gerrit.libreoffice.org/55978 fixes the writing v:shape to word/document.xml which should go to word/footnotes.xml.

There is another problem, that the v:shape (of a deleted object in tracked changes) doesn't contain v:imagedata reference, and so the object is lost after RT. This problem must be tracked separately.
Comment 14 Commit Notification 2018-06-18 07:12:06 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=c67b7d795589aaf8f3396a379ef348bd650cb2dc

tdf#107969: use proper serializer for VML in footnotes/endnotes

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 himajin100000 2018-06-18 17:16:16 UTC
* my local LibreOffice 6.2 build is re-based with master (commit 276a0f01f626193ac572a2ab8e7d5f2610aa372d weld SfxDocumentFontsPage)
running on i5-5200U CPU 2.20GHz, 8.00GB Main memory

testing...

1. Successful, OK: the re-saved smallest document can be loaded without errors, confirmed on Word 2016 and LibreOffice 6.2 

2. Successful, OK: the re-saved 10-page document can be loaded without errors, confirmed on Word 2016 and LibreOffice 6.2.

3. FAILED. 
the original non-bugged file was successfully loaded by LibreOffice 6.2 and Word 2016. 
the re-saved file was loaded successfully by Word 2016, but LibreOffice 6.2 HANGED AND EXITED SILENTLY WITHOUT ERRORS.
Comment 16 Mike Kaganski 2018-06-18 17:35:16 UTC
(In reply to himajin100000 from comment #15)
> 3. FAILED. 
> the original non-bugged file was successfully loaded by LibreOffice 6.2 and
> Word 2016. 
> the re-saved file was loaded successfully by Word 2016, but LibreOffice 6.2
> HANGED AND EXITED SILENTLY WITHOUT ERRORS.

One issue per bug. FIle a new bug for this problem please, because this issue is about extra content after XML.
Comment 17 Commit Notification 2018-07-03 16:59:18 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "libreoffice-6-1":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=336d052a8db24ec9d19cec47c54ff76ed35a413c&h=libreoffice-6-1

tdf#107969: use proper serializer for VML in footnotes/endnotes

It will be available in 6.1.0.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 NISZ LibreOffice Team 2019-05-28 13:04:58 UTC
*** Bug 115556 has been marked as a duplicate of this bug. ***