Bug 114882 - Error Message: File format error found at SAXParseException: '[word/document.xml line 2]: Opening and ending tag mismatch: txbxContent line 0 and sdtContent ', Stream 'word/document.xml', Line 2, Column 3157(row,col).
Summary: Error Message: File format error found at SAXParseException: '[word/document...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
4.4.0.3 release
Hardware: All Windows (All)
: medium normal
Assignee: Mike Kaganski
URL:
Whiteboard: target:6.1.0
Keywords: bibisected, regression
: 114409 (view as bug list)
Depends on:
Blocks: WPSShapeTextImport-Change
  Show dependency treegraph
 
Reported: 2018-01-07 12:44 UTC by Jon
Modified: 2020-05-31 01:12 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Document which cannot be opened (87.35 KB, application/zip)
2018-01-07 12:47 UTC, Jon
Details
The fixed document, with which the bug can be reproduced (90.08 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-01-07 14:39 UTC, Mike Kaganski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jon 2018-01-07 12:44:49 UTC
Description:
My partner created a document with Libre Office Writer 5.1.5.2 in a Windows 10 environment and saved it as a docx file.  When trying to open the document the error message above appears.  The document cannot be opened despite attempts to re-save in another format, (odt, rtf, doc, pdf) use other software (wordpad, adobe) and another operating system (Windows 7). The document is an important part of my partner's teacher training course and represents a significant amount of work so we would welcome any help to recover the document and I will attach it once the report is filed.

Actual Results:  
The document cannot be opened

Expected Results:
The document to open


Reproducible: Didn't try


User Profile Reset: No



Additional Info:
Open the document


User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36
Comment 1 Jon 2018-01-07 12:47:46 UTC
Created attachment 138941 [details]
Document which cannot be opened

Please can you help recover this document
Comment 2 Mike Kaganski 2018-01-07 14:39:34 UTC
Created attachment 138942 [details]
The fixed document, with which the bug can be reproduced

Reproducible with Version: 6.0.0.1 (x64)
Build ID: d2bec56d7865f05a1003dc88449f2b0fdd85309a
CPU threads: 4; OS: Windows 10.0; UI render: default; 
Locale: ru-RU (ru_RU); Calc: 

This file is fixed document. Two XML tags had to be removed: </w:sdtContent> and </w:sdt> after <w:txbxContent>. Now the file opens fine. But if you save it again using LibreOffice, the bug shows. Which is good in a sense, becayse we have a reproducible bug which can be acted upon, and fixed.
Comment 3 Mike Kaganski 2018-01-07 14:52:09 UTC
Reproducible with Version: 4.4.0.1
Build ID: 1ba9640ddd424f1f535c75bf2b86703770b8cf6f
Locale: ru_RU

Not reproducible with Version: 4.3.0.4
Build ID: 62ad5818884a2fc2e5780dd45466868d41009ec0
Comment 4 Jon 2018-01-07 16:45:05 UTC
Many thanks Mike and hopefully this will be of help to others
Comment 5 Mike Kaganski 2018-01-07 19:54:03 UTC
This has something to do with one (or both) of following commits:

https://cgit.freedesktop.org/libreoffice/core/commit/?id=d379d18666aa42031359ca8eb34b0021960347ae

author	Miklos Vajna <vmiklos@collabora.co.uk>	2014-06-18 11:57:31 +0200
committer	Miklos Vajna <vmiklos@collabora.co.uk>	2014-06-18 12:09:15 +0200
commit d379d18666aa42031359ca8eb34b0021960347ae
tree d0324e2297be256e8c291cd6e17676f68f9f072d
parent 8e67a7796f598de2f11b694542bccb48343f0d9a
oox: import WPS shape with text as shape with textbox

https://cgit.freedesktop.org/libreoffice/core/commit/?id=ab52bb712c335e88cf100b3b8336a46b7673eb98

author	Miklos Vajna <vmiklos@collabora.co.uk>	2014-10-08 18:00:34 +0200
committer	Miklos Vajna <vmiklos@collabora.co.uk>	2014-10-08 18:14:55 +0200
commit ab52bb712c335e88cf100b3b8336a46b7673eb98
tree 724b951cef11afe5c5059fba7fb0808069ebcf42
parent d61f8185e660a6820351b8cea3ac51d344f0ab3e
DOCX export: fix handling of shapes containing and also anchored inside tables

Before the first, it saved valid XML. After second, it started to save invalid XML. Between them, the file cannot be saved at all.
Comment 6 Mike Kaganski 2018-01-07 20:56:59 UTC
A patch in gerrit: https://gerrit.libreoffice.org/47546
Comment 7 Commit Notification 2018-01-08 05:58:42 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=4f65853df16a599fe81576bbccbca6ea78488d54

tdf#114882: don't try to close SDT when processing inner objects

It will be available in 6.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Gabor Kelemen (allotropia) 2018-01-09 01:01:18 UTC
*** Bug 114409 has been marked as a duplicate of this bug. ***
Comment 9 MM 2018-01-13 15:58:57 UTC
Can you backport it to the v6.0 branch aswell, now only master has the patch and not every user uses that one.
Comment 10 Mike Kaganski 2018-01-13 16:05:00 UTC
(In reply to MM from comment #9)

I understand you, and usually I'd backport it right away. But this time I'm somewhat uneasy with the fix. So I hesitate to push it to soon-to-be-released 6-0.