Bug 136441 - FILESAVE: Saxparse error on file open (word/footnotes.xml line 2) after save to DOCX & file reload
Summary: FILESAVE: Saxparse error on file open (word/footnotes.xml line 2) after save ...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.3.0.4 release
Hardware: All All
: medium normal
Assignee: Justin L
URL:
Whiteboard: target:7.1.0 target:7.0.3
Keywords: bibisected, bisected, regression
Depends on:
Blocks: DOCX-SAXParse
  Show dependency treegraph
 
Reported: 2020-09-03 15:40 UTC by Telesto
Modified: 2020-10-01 08:42 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Example file (626.36 KB, application/vnd.oasis.opendocument.text)
2020-09-03 15:40 UTC, Telesto
Details
Example file (22.53 KB, application/vnd.oasis.opendocument.text)
2020-09-04 19:07 UTC, Telesto
Details
tdf136441_commentInFootnote.odt: DOCX round-trip loads with an error (10.47 KB, application/vnd.oasis.opendocument.text)
2020-09-04 19:23 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Telesto 2020-09-03 15:40:42 UTC
Description:
FILEOPEN: Saxparse error on file open (word/footnotes.xml line 2) after save to DOCX & file reload

Steps to Reproduce:
1. open the attached ODT
2. Save as DOCX
3. File reload



Actual Results:
File format error found at 
SAXParseException: "No input source"
SAXParseException: '[word/footnotes.xml line 2]: unknown error', Stream 'word/footnotes.xml', Line 2, Column 144705
SAXParseException: '[word/document.xml line 2]: unknown error', Stream 'word/document.xml', Line 2, Column 90281(row,col).

Expected Results:
Probably not


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 7.1.0.0.alpha0+ (x64)
Build ID: 1e0cfd5662d95cea84e80e4fe10d52c3b1101ae6
CPU threads: 4; OS: Windows 6.3 Build 9600; UI render: Skia/Raster; VCL: win
Locale: nl-NL (nl_NL); UI: en-US
Calc: CL
Comment 1 Telesto 2020-09-03 15:40:55 UTC
Created attachment 165108 [details]
Example file
Comment 2 Telesto 2020-09-03 15:46:02 UTC
Also in the export of 4.4.7.2
Comment 3 Aron Budea 2020-09-03 23:49:46 UTC
It's a bit unusual that I see 7.1.0.0.alpha0+ and 4.4.7.2 mentioned in the comments, and at the same time the version is set to 6.0.0.3, which isn't mentioned anywhere else. It'd also be great to see the bugs added to the META bugs they belong to from the start.

Confirmed using LO 7.1.0.0.alpha0+ (e2f4e65a7b8024c00b049eebf0d87637efda7f24), 4.3.0.4 / Ubuntu.
No error in LO 4.2.0.4.
=> regression

The good/bad exported files behave accordingly in a fresh master build, so it's a FILESAVE issue.

Bibisected to the following commit using repo bibisect-43max. I don't see any redlining in the document, so I don't know how it might be related, but this is the bibisect result nevertheless.

https://cgit.freedesktop.org/libreoffice/core/commit/?id=e52f14efaa53b496599b51fb064a933183731fca
author		Adam Co <rattles2013@gmail.com>	2013-12-08 17:14:14 +0200
committer	Miklos Vajna <vmiklos@collabora.co.uk>	2013-12-16 11:50:59 +0100

Export redline 'paragraph formatting changes' back to DOCX
Comment 4 Mike Kaganski 2020-09-04 05:58:07 UTC Comment hidden (obsolete)
Comment 5 Mike Kaganski 2020-09-04 06:04:14 UTC Comment hidden (obsolete)
Comment 6 Mike Kaganski 2020-09-04 06:43:56 UTC
Sorry for the noise; of course, if I answer "No" to the question is I want to proceed, I get the error from the description.

Note again that the file opens fine in Word.

The problem is <w:commentRangeStart w:id="72"/>...<w:commentRangeEnd w:id="72"/> followed by <w:commentReference w:id="72"/> in word/footnotes.xml. The latter w:commentReference element results in a call to OOXMLDocumentImpl::resolveComment, which tries to resolve a dedicated comments stream referenced from current stream (word/footnotes.xml), but there's none in word/_rels, so the following tries to resolve the comment 72 in absent stream, instead of trying to get the comment defined in-line in the current stream first.

So this is in fact not a filesave issue, but fileopen issue...

Hope this would be helpful to whoever tries to fix this.
Comment 7 Telesto 2020-09-04 08:03:49 UTC Comment hidden (off-topic)
Comment 8 Telesto 2020-09-04 17:49:14 UTC
@Justin
As you're working with impressive speed through DOCX bugs; another suggestion..

(In reply to Mike Kaganski from comment #6)
> Sorry for the noise; of course, if I answer "No" to the question is I want
> to proceed, I get the error from the description.
> 
> Note again that the file opens fine in Word.
> 
> The problem is <w:commentRangeStart w:id="72"/>...<w:commentRangeEnd
> w:id="72"/> followed by <w:commentReference w:id="72"/> in
> word/footnotes.xml. The latter w:commentReference element results in a call
> to OOXMLDocumentImpl::resolveComment, which tries to resolve a dedicated
> comments stream referenced from current stream (word/footnotes.xml), but
> there's none in word/_rels, so the following tries to resolve the comment 72
> in absent stream, instead of trying to get the comment defined in-line in
> the current stream first.
> 
> So this is in fact not a filesave issue, but fileopen issue...
> 
> Hope this would be helpful to whoever tries to fix this.
Comment 9 Justin L 2020-09-04 18:46:03 UTC
Nobody is going to work on a 353 page document that takes 15 minutes to load with tons of  unknown element xml:id http://www.w3.org/XML/1998/namespace errors.

Needed is a single page minimal document containing a single, problematic comment.
Comment 10 Telesto 2020-09-04 19:07:26 UTC
Created attachment 165153 [details]
Example file

Trimmed down variant to 5 pages
Comment 11 Justin L 2020-09-04 19:23:09 UTC
Created attachment 165155 [details]
tdf136441_commentInFootnote.odt: DOCX round-trip loads with an error

Thanks to Mike for the clear pointers to how to replicate the problem.
Comment 12 Justin L 2020-09-04 19:29:16 UTC
Word 2003 does not allow me to create a comment in a footnote.
Comment 13 Telesto 2020-09-04 19:59:50 UTC
(In reply to Justin L from comment #9)
> Nobody is going to work on a 353 page document that takes 15 minutes to load
> with tons of  unknown element xml:id http://www.w3.org/XML/1998/namespace
> errors.
> 
> Needed is a single page minimal document containing a single, problematic
> comment.

About this.. they author of the source document complains about perf issues.. Is there a way to scrub a file of rubbish?
Comment 14 Justin L 2020-09-05 11:29:30 UTC
(In reply to Justin L from comment #12)
> Word 2003 does not allow me to create a comment in a footnote.
Neither does Word 2016.
LO also doesn't see/export the footnote into DOC or RTF either.

So the simplest solution is to just throw the unusable comment out on export.
https://gerrit.libreoffice.org/c/core/+/102074
Comment 15 Justin L 2020-09-05 11:46:27 UTC
MS Word doesn't allow comments in headers/footers either.
so appropriate in TXT_MAINTEXT, but not TXT_FTN / TXT_EDN / TXT_HDFT.

LO had the same SAX problem with comments in a header, so added a comment to my unit test's header as well.
Comment 16 Commit Notification 2020-09-05 14:40:21 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/870bb98b3a1159e31895524ef54457db37d1b9af

tdf#136441 ms export: don't export comments in footnotes

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Justin L 2020-09-05 16:15:33 UTC
So the filesave won't create a problematic file anymore.

The remaining consideration for this bug report would be to import any older problematic files that have been created earlier.

It seems like a substream can't call another substream? But I can't figure out why, and this sax parser stuff makes no sense to me.

Probably there is no value in actually trying to get this to work to read old files. So I'll just close this bug. If someone cares about the reading, they should create a bad DOCX file and attach it to a new bug report I guess.
Comment 18 Telesto 2020-09-05 17:24:41 UTC
(In reply to Justin L from comment #17)
> So the filesave won't create a problematic file anymore.
> 
> The remaining consideration for this bug report would be to import any older
> problematic files that have been created earlier.
> 
> It seems like a substream can't call another substream? But I can't figure
> out why, and this sax parser stuff makes no sense to me.
> 
> Probably there is no value in actually trying to get this to work to read
> old files. So I'll just close this bug. If someone cares about the reading,
> they should create a bad DOCX file and attach it to a new bug report I guess.

Not seeing the point :-). They error is non-fatal; so a waste of resources.
Comment 19 Mike Kaganski 2020-09-08 09:56:21 UTC
(In reply to Justin L from comment #17)
> It seems like a substream can't call another substream? But I can't figure
> out why, and this sax parser stuff makes no sense to me.

Why? Of course it can. Just we didn't *export* the relevant sub-substream, and related _rels data - and we shouldn't have had to do that. (My comment 6 was also wrong, as I assumed that the comment data was contained inline which was not the case, and indeed the file we generated prior to your great fix *was* invalid, having a reference to a comments substream from the footers substream, but no such substream existing. Word just happened to ignore that.)

FTR: you may generate a "valid" DOCX with comment in footnotes like this:

1. Convert attachment 165153 [details] into DOCX using a pre-fixed LO version.
   => a reference to comment with id=1 will appear in footnotes.xml
2. Open word/comments.xml, and add a `w:comment` element with id=1.
3. Add word/_rels/footnotes.xml.rels with

> <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments" Target="comments.xml"/>

That's all, the resulting DOCX will open OK in LO, and the comment contents will appear there (almost correctly). Alternatively, one could add a separate "comments-for-footnotes.xml" aside original "comments.xml", and use that in the word/_rels/footnotes.xml.rels.

But validating the result with Open XML SDK 2.5 Productivity Tool for Microsoft Office gives this validation error:

> The package/part 'FootnotesPart{/word/footnotes.xml}' cannot have a relationship
> that targets part 'WordprocessingComponentsPart{/word/comments.xml}'.

Thank you Justin for the fix!
Comment 20 Xisco Faulí 2020-09-29 15:55:30 UTC
Verified in

Version: 7.1.0.0.alpha0+
Build ID: cd85546a2fbdade42f80fd3b6bd650791db9f32d
CPU threads: 4; OS: Linux 5.7; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

@Justin Luth, thanks for fixing this issue!!
Comment 21 Commit Notification 2020-10-01 08:42:50 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "libreoffice-7-0":

https://git.libreoffice.org/core/commit/270604a11022ab4fb9a3ac299d9a42e1d8464c47

tdf#136441 ms export: don't export comments in footnotes

It will be available in 7.0.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.