Bug 96749 - FILEOPEN: Cannot open specific DOCX without header.xml and footer.xml in newer versions
Summary: FILEOPEN: Cannot open specific DOCX without header.xml and footer.xml in newe...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.3 all versions
Hardware: All All
: medium normal
Assignee: Caolán McNamara
URL:
Whiteboard: target:5.3.0
Keywords: bibisected, bisected, filter:docx, regression
: 91611 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-12-28 16:13 UTC by Dan Sc
Modified: 2016-08-31 14:20 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
The document cannot be opened in recent versions of Writer (6.41 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2015-12-28 16:13 UTC, Dan Sc
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Sc 2015-12-28 16:13:33 UTC
Created attachment 121582 [details]
The document cannot be opened in recent versions of Writer

The attached DOCX was generated by 1C:Enterprise -- extremely popular monopoly business CRM in Russia, with huge userbase (millions of installations).

You can open/edit/save it correctly in Writer, version 3.x, 4.1.x, 4.2.x. But more recent versions (starting with 4.3.x and later, including 5.x) fail with "general input/output error". The bug appears in both Windows and Linux versions (x86/x64).
Comment 1 Julien Nabet 2015-12-28 17:40:52 UTC
Unzipping docx shows:
  inflating: [Content_Types].xml     
  inflating: word/document.xml       
  inflating: word/settings.xml       
  inflating: word/styles.xml         
  inflating: word/_rels/document.xml.rels  
  inflating: _rels/.rels          

but [Content_Types].xml indicates:
header1.xml, header2.xml, footer1.xml and footer2.xml

I don't have MsOffice, do you confirm you can open this file with Word?
Comment 2 Julien Nabet 2015-12-28 18:29:42 UTC
Anyway, I could reproduce this and noticed this:
warn:writerfilter:20417:1:writerfilter/source/filter/WriterFilter.cxx:214: WriterFilter::filter(): failed with exception Element does not exist and cannot be created: "header1.xml"
Comment 3 Julien Nabet 2015-12-28 18:34:13 UTC
I submitted this patch to review:
https://gerrit.libreoffice.org/#/c/20993/
Comment 4 Julien Nabet 2015-12-28 20:33:22 UTC
*** Bug 91611 has been marked as a duplicate of this bug. ***
Comment 5 Dan Sc 2015-12-28 20:52:28 UTC
1) Yes, MSO 2010 opens it without any warnings.

2) Moreover, even WordPad (bundled with Win7) opens it. (Table formatting is crooked though, but it's able to edit/save it).

3) Again, somehow the older versions of Writer (3.x, 4.2.x) are able to open it too.
Comment 6 Chris Sherlock 2016-02-14 02:24:15 UTC
I've cherry-picked the gerrit change and will test this out.
Comment 7 Chris Sherlock 2016-02-14 06:40:11 UTC
This patch is working on Linux. 

Output as expected when SAL_WARN turned on:

chris@libreoffice-ia64:~/repos/libreoffice$ instdir/program/soffice --writer ~/bug96749.docx 
warn:vcl.opengl:4273:1:vcl/opengl/x11/X11DeviceInfo.cxx:356: unknown vendor => blocked
warn:writerfilter:4273:1:writerfilter/source/ooxml/OOXMLDocumentImpl.cxx:773: resolveEmbeddingsStream: exception while resolving stream 20 : Element does not exist and cannot be created: "header1.xml"
warn:writerfilter:4273:1:writerfilter/source/ooxml/OOXMLDocumentImpl.cxx:773: resolveEmbeddingsStream: exception while resolving stream 19 : Element does not exist and cannot be created: "footer1.xml"
warn:writerfilter:4273:1:writerfilter/source/ooxml/OOXMLDocumentImpl.cxx:773: resolveEmbeddingsStream: exception while resolving stream 20 : Element does not exist and cannot be created: "header2.xml"
warn:writerfilter:4273:1:writerfilter/source/ooxml/OOXMLDocumentImpl.cxx:773: resolveEmbeddingsStream: exception while resolving stream 19 : Element does not exist and cannot be created: "footer2.xml"
warn:writerfilter:4273:1:writerfilter/source/dmapper/DomainMapper_Impl.cxx:556: no context of type 1 available
warn:legacy.osl:4273:1:oox/source/helper/storagebase.cxx:67: StorageBase::StorageBase - missing base input stream
warn:sw.uno:4273:1:sw/source/core/unocore/unotext.cxx:2292: Exception when setting property: CharFontName. Message: 
warn:sw.uno:4273:1:sw/source/core/unocore/unotext.cxx:2292: Exception when setting property: CharHeight. Message: 
warn:sw.uno:4273:1:sw/source/core/unocore/unotext.cxx:2292: Exception when setting property: CharHeightAsian. Message: 
warn:sw.uno:4273:1:sw/source/core/unocore/unotext.cxx:2292: Exception when setting property: ParaBottomMargin. Message: 
warn:sw.uno:4273:1:sw/source/core/unocore/unotext.cxx:2292: Exception when setting property: ParaLineSpacing. Message: 
warn:legacy.osl:4273:1:svx/source/dialog/rulritem.cxx:523: Wrong MemberId!
warn:legacy.osl:4273:1:editeng/source/items/frmitems.cxx:464: unknown MemberId
warn:legacy.osl:4273:1:svx/source/dialog/rulritem.cxx:523: Wrong MemberId!
warn:legacy.osl:4273:1:editeng/source/items/frmitems.cxx:464: unknown MemberId
warn:legacy.osl:4273:1:include/cppuhelper/interfacecontainer.h:479: object is disposed
warn:legacy.osl:4273:1:include/cppuhelper/interfacecontainer.h:479: object is disposed
warn:legacy.osl:4273:1:sw/source/core/attr/format.cxx:227: SwFormat::~SwFormat: Def dependents!
warn:sw.core:4273:1:sw/source/core/attr/format.cxx:236: ~SwFormat: parent format missing from: Paragraph style
warn:legacy.tools:4273:1:basic/source/sbx/sbxobj.cxx:94: Object element with dangling parent
warn:legacy.tools:4273:1:basic/source/sbx/sbxobj.cxx:94: Object element with dangling parent
warn:legacy.tools:4273:1:basic/source/sbx/sbxobj.cxx:94: Object element with dangling parent
warn:legacy.tools:4273:1:basic/source/sbx/sbxobj.cxx:94: Object element with dangling parent
warn:legacy.tools:4273:1:basic/source/sbx/sbxobj.cxx:94: Object element with dangling parent
warn:legacy.tools:4273:1:basic/source/sbx/sbxobj.cxx:94: Object element with dangling parent
warn:legacy.tools:4273:1:basic/source/sbx/sbxobj.cxx:94: Object element with dangling parent
warn:legacy.tools:4273:1:basic/source/sbx/sbxobj.cxx:94: Object element with dangling parent
warn:legacy.tools:4273:1:basic/source/sbx/sbxobj.cxx:94: Object element with dangling parent


However... something going on that is making it fail on Gerrit on Linux and OS X. Investigating.
Comment 8 Chris Sherlock 2016-03-07 01:19:34 UTC
Not a regression, I think this has always been the case that this won't work. But you were on the right track, only this is a recursive function so we should just bail out of resolveEmbeddingsStream when we discover that we are trying to resolve a missing header or footer. 

I'll test out the change I made to the gerrit patch and see if it works. Nice bit of troubleshooting there Julien!
Comment 9 Chris Sherlock 2016-03-07 01:33:58 UTC
Ah, I see. My bad, this is a regression (I suppose). The original issue was one we fixed in bug 76356 - a chart in the header or footer of a .docx file got corrupted. 

In the 4.3 series if the header or footer was missing, then it would just continue. We're a bit more robust in that we actually handle headers and footers more carefully now, but we are a tad too thorough - if it's missing then we give an I/O error to the user. But that's not necessary. 

Dan, you might want to advise 1C that there is a bug in their .docx export - they are exporting files that refer to non-existent headers for some reason. 

Anyway, if there are millions of installations I'll see if I can backport this.
Comment 10 Chris Sherlock 2016-03-07 08:52:47 UTC
OK, so I expected that to work, but it hasn't. Back to the drawing board.
Comment 11 Julien Nabet 2016-04-23 19:50:28 UTC
Sorry, I forgot to unassign myself.

Chris: perhaps you'd like to keep on with this one.
Comment 12 raal 2016-05-02 16:16:36 UTC
This seems to have begun at the below commit.
Adding Cc: to sushil_shinde ; Could you possibly take a look at this one?
Thanks
 901d4d3b18ebe50022f95017287ac564fc16410d is the first bad commit
commit 901d4d3b18ebe50022f95017287ac564fc16410d
Author: Matthew Francis <mjay.francis@gmail.com>
Date:   Thu May 28 20:29:30 2015 +0800

    source-hash-23b65a84fd827555dfb84c7e2f78879c479c2f78
    
    commit 23b65a84fd827555dfb84c7e2f78879c479c2f78
    Author:     sushil_shinde <sushil.shinde@synerzip.com>
    AuthorDate: Wed Mar 19 18:34:45 2014 +0530
    Commit:     Miklos Vajna <vmiklos@collabora.co.uk>
    CommitDate: Sun Mar 23 11:02:16 2014 +0100
    
        fdo#76356 : Docx file contianing chart in footer/header gets corrupted.
    
            -  Docx file with chart in footer/header or .bin file referred in chart
               was getting corrupted.
            -  Embedded file for footer.xml was not grabbaged.
            -  .bin embedded files were not grab baged.
            -  Added grab bag support for both case.
            -  Added UT to check .bin files are grab baged properly.
    
        Reviewed on:
        	https://gerrit.libreoffice.org/8674
    
        Change-Id: I221e3867798fc2a3a42f6385d687e80b80a3678f
Comment 13 Sushil Shinde 2016-05-03 07:13:59 UTC
(In reply to raal from comment #12)
> This seems to have begun at the below commit.
> Adding Cc: to sushil_shinde ; Could you possibly take a look at this one?
> Thanks
> 
> 901d4d3b18ebe50022f95017287ac564fc16410d is the first bad commit
> commit 901d4d3b18ebe50022f95017287ac564fc16410d
> Author: Matthew Francis <mjay.francis@gmail.com>
> Date:   Thu May 28 20:29:30 2015 +0800
> 
>     source-hash-23b65a84fd827555dfb84c7e2f78879c479c2f78
>     
>     commit 23b65a84fd827555dfb84c7e2f78879c479c2f78
>     Author:     sushil_shinde <sushil.shinde@synerzip.com>
>     AuthorDate: Wed Mar 19 18:34:45 2014 +0530
>     Commit:     Miklos Vajna <vmiklos@collabora.co.uk>
>     CommitDate: Sun Mar 23 11:02:16 2014 +0100
>     
>         fdo#76356 : Docx file contianing chart in footer/header gets
> corrupted.
>     
>             -  Docx file with chart in footer/header or .bin file referred
> in chart
>                was getting corrupted.
>             -  Embedded file for footer.xml was not grabbaged.
>             -  .bin embedded files were not grab baged.
>             -  Added grab bag support for both case.
>             -  Added UT to check .bin files are grab baged properly.
>     
>         Reviewed on:
>         	https://gerrit.libreoffice.org/8674
>     
>         Change-Id: I221e3867798fc2a3a42f6385d687e80b80a3678f

Sure.
Comment 14 Commit Notification 2016-08-31 14:19:58 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=9876ffe934a21df1df4a457aa88aa8441243dba9

tdf#96749: deal with missing custom headers/footers in docx

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.