Bug 108272 - FILEOPEN: SAXException: [word/document.xml line 2]: unknown error for .docx with floating table in header
Summary: FILEOPEN: SAXException: [word/document.xml line 2]: unknown error for .docx w...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.0 all versions
Hardware: All All
: medium normal
Assignee: László Németh
URL:
Whiteboard: target:7.0.0 target:6.4.2
Keywords: filter:docx
Depends on:
Blocks: DOCX-Tables DOCX-SAXParse DOCX-Opening
  Show dependency treegraph
 
Reported: 2017-05-31 20:15 UTC by Frederic Parrenin
Modified: 2020-02-07 16:59 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
.docx file to reproduce the problem (87.76 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-05-31 20:15 UTC, Frederic Parrenin
Details
screenshot of assertion (18.88 KB, image/png)
2017-05-31 22:10 UTC, Regina Henschel
Details
bibisect in 50max repository (3.74 KB, text/plain)
2017-06-17 02:21 UTC, Terrence Enger
Details
.docx simplified to 2 pages (50.04 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-11-24 09:00 UTC, Timur
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Frederic Parrenin 2017-05-31 20:15:10 UTC
Created attachment 133759 [details]
.docx file to reproduce the problem

Steps to reproduce:
- try to open the attached .docx file
=> an error message appears, and LO does not open the file
Erreur de format de fichier à la position 
SAXParseException: '[word/document.xml line 2]: unknown error', Stream 'word/document.xml', Line 2, Column 140548(row,col).
Comment 1 Xisco Faulí 2017-05-31 20:52:21 UTC
Confirmed in

Version: 5.5.0.0.alpha0+
Build ID: 9956849c2ea6049582e2ccf04c355542c1ef00a1
CPU Threads: 4; OS Version: Linux 4.8; UI Render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group

and

Version: 5.0.0.0.alpha1+
Build ID: 0db96caf0fcce09b87621c11b584a6d81cc7df86
Locale: ca-ES (ca_ES.UTF-8)

@Mike, one for you?
Comment 2 Regina Henschel 2017-05-31 22:10:26 UTC
Created attachment 133761 [details]
screenshot of assertion

The document opens fine in Word2010.

The document opens fine in LibreOffice Version: 4.2.2.1
Build ID: 3be8cda0bddd8e430d8cda1ebfd581265cca5a0f

The document opens, but looks faulty (double first heading) in Version: 4.4.0.3
Build-ID: de093506bcdc5fafd9023ee680b8c60e3e0645d7
Gebietsschema: de_DE
and
Version: 4.3.0.0.alpha1+
Build ID: 145f2e970f46a3a3e5456b122d71f17c3abe878f
TinderBox: Win-x86@42, Branch:master, Time: 2014-04-26_23:32:36

I get the attached assertion in
Version: 5.5.0.0.alpha0+
Build ID: b56d1e294d838d4b3d0f237c81325a0d1a1cff83
CPU threads: 4; OS: Windows 6.1; UI render: default; 
TinderBox: Win-x86@39, Branch:master, Time: 2017-05-26_06:14:42
Locale: de-DE (de_DE); Calc: group

and a similar assertion in
Version: 4.3.0.0.alpha1+
Build ID: 0b03f7ed575838f90e6b1ebec3538a3a214f81fb
TinderBox: Win-x86@39, Branch:master, Time: 2014-04-30_01:30:46
Comment 3 Terrence Enger 2017-06-17 02:21:17 UTC
Created attachment 134075 [details]
bibisect in 50max repository

Working on debian-stretch in bibisect-50max, the result I deemed "bad"
was a dialog box (rewrapped)

    file format error found at
    SAXParseException: '[word/document.xml line 2]: unknown error'.
        Stream 'word/document.xml', Line 2, Column 140548(row,col).

and I found

          commit    s-h
          --------  --------
    good  4e454b28  825e4995
    bad   8b40032c  ebf767ee

The short message for commit ebf767ee is

    writerfilter: DOCX import: better error handling than "catch (...) {}"

I am removing keyword bibisectRequest and adding bisected.
Comment 4 Xisco Faulí 2017-06-17 13:05:40 UTC
ebf767ee is not the cause of the error. it just introduced SAXParseException errors handling
Comment 5 QA Administrators 2018-06-18 02:42:14 UTC Comment hidden (obsolete)
Comment 6 Frederic Parrenin 2018-06-19 07:48:55 UTC
So the situation is a bit better in 6.0.4.
There is still an error message, but the file opens at the end.
Comment 7 Timur 2018-11-22 17:40:36 UTC
SAXException: [word/document.xml line 2]: unknown error for .docx with floating table in header
If floating table converted to regular, no error. 

Mike, you had fix in Bug 116989: disable conversion of tables in footers to floating for now. Is similar applicable here?
Comment 8 QA Administrators 2019-11-23 03:43:59 UTC Comment hidden (obsolete)
Comment 9 Timur 2019-11-24 09:00:14 UTC
Created attachment 156071 [details]
.docx simplified to 2 pages

Original .docx has 5 pages. 
Main issue and this bug is SAXException: [word/document.xml line 2] and it's still there in LO 6.5+.
We may proceed after warning and open file. 
I notice header is there, but with wrong size. Table is also out of page. Footer content is different.

In order to more easily spot differences in header, table and footer, I attach MSO reduced .docx of 2 pages, 2nd being empty and not showing in LO.
Comment 10 Commit Notification 2020-02-04 09:08:48 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/213d6390a2cc59d174173f4359c161625a9c4bdc

tdf#108272 DOCX table-only header: fix SAX parser error

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Xisco Faulí 2020-02-05 15:20:04 UTC
Verified in

Version: 7.0.0.0.alpha0+
Build ID: d41d7ecb60fb38204fafcb1aa4595992721855e6
CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); UI-Language: en-US
Calc: threaded

@László Németh, thanks for fixing this issue!!
Comment 12 Xisco Faulí 2020-02-05 15:22:32 UTC
I'm not backporting it to previous branches. See https://gerrit.libreoffice.org/#/c/core/+/88037/
Comment 13 Commit Notification 2020-02-07 16:59:18 UTC
László Németh committed a patch related to this issue.
It has been pushed to "libreoffice-6-4":

https://git.libreoffice.org/core/commit/bd704b167a07054335601aa86c636d7db84e982a

tdf#108272 DOCX table-only header: fix SAX parser error

It will be available in 6.4.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.