Bug 101287 - File won't open: "File format error found at SAXParseException:"
Summary: File won't open: "File format error found at SAXParseException:"
Status: RESOLVED DUPLICATE of bug 96878
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected) release
Hardware: All All
: medium normal
Assignee: Not Assigned
Keywords: bibisected, bisected, filter:docx
Depends on:
Reported: 2016-08-04 10:43 UTC by Simon
Modified: 2017-11-14 11:02 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:

DB Spec for Snee project (31.22 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2016-08-04 10:43 UTC, Simon
File corrected (50.91 KB, application/vnd.openxmlformats)
2016-10-27 19:57 UTC, Julien Nabet

Note You need to log in before you can comment on or make changes to this bug.
Description Simon 2016-08-04 10:43:20 UTC
Created attachment 126568 [details]
DB Spec for Snee project

I get this error message on opening the attached file: 

File format error found at SAXParseException: '[word/document.xml line 2]: Attribute w:eastAsiaTheme redefined', Stream 'word/document.xml', Line 2, Column 89396(row,col).

Seems to be similar to https://bugs.documentfoundation.org/show_bug.cgi?id=92157, which is now closed.

Tried downgrading to 4.4.7 as recommended here https://bugs.documentfoundation.org/show_bug.cgi?id=97063, but the file only partially loads - most of the text is lost.

I also tried looking at the docx's document.xml file, but there is no "Attribute w:eastAsiaTheme" at Line 2, Column 89396.

I also tried the solution here https://bugs.documentfoundation.org/show_bug.cgi?id=97063#c8 - but it didn't work. I get a 'General I/O error' message when trying to open the file.

Please help - there are day's of work in that file! 
Comment 1 Simon 2016-08-04 14:16:19 UTC
Well I've recovered the file. I went through this process again, https://bugs.documentfoundation.org/show_bug.cgi?id=97063#c8, I guess more carefully this time, and I've got the file back.
The bug's still there, though.
Comment 2 Buovjaga 2016-08-07 15:50:08 UTC
It opens ok in 3.6.

Arch Linux 64-bit, KDE Plasma 5
Build ID: f3d26af51588af441f62fb69bb7a5432845226ac
CPU Threads: 8; OS Version: Linux 4.6; UI Render: default; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on August 5th 2016

Arch Linux 64-bit
Version (Build ID: e183d5b)
Comment 3 Caolán McNamara 2016-08-08 16:26:46 UTC
Bisecting will just show...


writerfilter: DOCX import: better error handling than "catch (...) {}"
If there is a SAXParseException, OOXMLDocumentImpl::resolve() should not
ignore it,

which would just get you back to silent truncation

The xml is indeed invalid so the problem took place at save, not load. The docProps/app.xml claims LibreOffice so the generator seems to be us so apparently at least in that version we have a save bug.

Taking the document and stripping out the invalid tags and resaving it as docs in 5.1 gives a well formed document so I can't tell what the circumstances are to reproduce the invalid generation.

Is this a document exported from another source document ?
Comment 4 Caolán McNamara 2016-08-08 16:28:05 UTC
*** Bug 97063 has been marked as a duplicate of this bug. ***
Comment 5 Simon 2016-08-08 21:23:12 UTC
I can't be 100% sure but I think the document was created in Word by another user, passed on to me, who worked on it in LO for a few days before it got corrupted.
Comment 6 Xisco Faulí 2016-09-26 15:21:16 UTC
Adding Cc: to Michael Stahl
Comment 7 Julien Nabet 2016-10-27 19:57:39 UTC
Created attachment 128311 [details]
File corrected

After having decompressed the file then used this
tidy -utf8 -xml -w 255 -i -c -q -asxml on word/document.xml
I recompressed the file and could open it on LO built from master sources updated today.
Could you give it a try?
Comment 8 Xisco Faulí 2016-11-27 15:24:54 UTC
I guess this can be closed as RESOLVED DUPLICATE of 99227 as both were
introduced by the same commit

*** This bug has been marked as a duplicate of bug 99227 ***
Comment 9 Timur 2016-11-28 16:08:02 UTC
Xisco, if you think of DOCX import: better error handling than "catch (...) {}", then it's not relevant for a duplicate. 
As explained before, it's just better handling of existing errors.
I'd rather mark this one as Invalid because there is no source document, nor the steps to reproduce (like: save DOCX in LO).
Comment 10 Timur 2016-12-16 15:39:44 UTC

*** This bug has been marked as a duplicate of bug 96878 ***
Comment 11 Xisco Faulí 2017-10-27 11:08:50 UTC
*** Bug 102131 has been marked as a duplicate of this bug. ***