Bug 49819 - FILEOPEN error when loading a particular .docx file.
Summary: FILEOPEN error when loading a particular .docx file.
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.3.4 release
Hardware: x86-64 (AMD64) All
: high critical
Assignee: Not Assigned
QA Contact: Joel Madero
URL:
Whiteboard: target:3.7.0 target:3.6.2
Keywords:
: 44853 45207 54968 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-05-11 19:04 UTC by xing
Modified: 2016-07-11 21:17 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
a user submitted docx file (18.96 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-05-11 19:04 UTC, xing
Details
debugging patch (3.72 KB, text/plain)
2012-09-21 17:29 UTC, Michael Meeks
Details

Note You need to log in before you can comment on or make changes to this bug.
Description xing 2012-05-11 19:04:34 UTC
Created attachment 61470 [details]
a user submitted docx file

We have attempted to load this particular .docx file on both 3.5.3 and 3.5.1rc1 on Windows 7 64bit and Centos 6 64bit and on both systems we are not able to load the file with libreoffice. The error is just a input/output popup error dialog. 

The file appears to be properly zipped and of good structure so not sure where it went wrong during the loading process.

Expected result: .docx file loads.

Actual result: error popup.
Comment 1 s-joyemusequna 2012-05-12 01:15:04 UTC
Confirmed with LOdev 3.6 (2012-05-10) version 3.6.0alpha0+ (Build ID: 9980e69) and LibO 3.4.5 on Windows Vista 64.
Comment 2 Korrawit Pruegsanusak 2012-05-12 21:38:52 UTC
[REPRODUCIBLE] 3.5.3.2 Windows XP, show error popup.

Version field should be the earliest one with problem. <http://wiki.documentfoundation.org/BugReport_Details#Version> So, change to 3.4.5 per comment 1.
Comment 3 s-joyemusequna 2012-05-13 00:10:25 UTC
Same problem with version 3.3.4 (tested under Windows XP).
Comment 4 xing 2012-06-10 19:54:08 UTC
Increasing the priority of this ticket as it is high priority with bug confirmations and reproducible test file.
Comment 5 Michael Meeks 2012-09-21 09:12:33 UTC
Looks like we detect a problem with the zip file:

(gdb) bt 15
#0  __cxxabiv1::__cxa_throw (obj=0x95443b8, tinfo=0xb06c5438 <typeinfo for com::sun::star::packages::zip::ZipIOException>, dest=
    0xb0689dc6 <com::sun::star::packages::zip::ZipIOException::~ZipIOException()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:63
#1  0xb068abab in ZipFile::readLOC (this=0x9545670, rEntry=...) at /ssd/opt/libreoffice/master/package/source/zipapi/ZipFile.cxx:706

704	    if ( bBroken && !bRecoveryMode )
705	        throw ZipIOException("The stream seems to be broken!",
706	                            uno::Reference< XInterface >() );

#2  0xb068c05e in ZipFile::getDataStream (this=0x9545670, rEntry=..., rData=..., bIsEncrypted=0 '\000', aMutexHolder=...)
    at /ssd/opt/libreoffice/master/package/source/zipapi/ZipFile.cxx:577
#3  0xb06a9c0f in ZipPackageStream::getDataStream (this=0xad6363e0)
    at /ssd/opt/libreoffice/master/package/source/zippackage/ZipPackageStream.cxx:551
#4  0xac4df7b3 in OWriteStream_Impl::GetStream_Impl (this=0x9544278, nStreamMode=1, bHierarchyAccess=1 '\001')
    at /ssd/opt/libreoffice/master/package/source/xstor/owriteablestream.cxx:1357
#5  0xac4e2b0f in OWriteStream_Impl::GetStream (this=0x9544278, nStreamMode=1, bHierarchyAccess=1 '\001')
    at /ssd/opt/libreoffice/master/package/source/xstor/owriteablestream.cxx:1337
#6  0xac4fb209 in OStorage::openStreamElementByHierarchicalName (this=0xad632458, aStreamPath=..., nOpenMode=1)
    at /ssd/opt/libreoffice/master/package/source/xstor/xstorage.cxx:6241
#7  0xac4d3da1 in OHierarchyElement_Impl::GetStreamHierarchically (this=0xaf28ea38, nStorageMode=1, 
    aListPath=std::vector of length 0, capacity 2, nStreamMode=1, aEncryptionData=...)
    at /ssd/opt/libreoffice/master/package/source/xstor/ohierarchyholder.cxx:106
#8  0xac4d404f in OHierarchyElement_Impl::GetStreamHierarchically (this=0xaf28e618, nStorageMode=1, 
    aListPath=std::vector of length 0, capacity 2, nStreamMode=1, aEncryptionData=...)
    at /ssd/opt/libreoffice/master/package/source/xstor/ohierarchyholder.cxx:148
#9  0xac4d432d in OHierarchyHolder_Impl::GetStreamHierarchically (this=0xad63132c, nStorageMode=1, 
    aListPath=std::vector of length 0, capacity 2, nStreamMode=1, aEncryptionData=...)
    at /ssd/opt/libreoffice/master/package/source/xstor/ohierarchyholder.cxx:42
#10 0xac4fb2bb in OStorage::openStreamElementByHierarchicalName (this=0xa2de4e04, aStreamPath=..., nOpenMode=1)
    at /ssd/opt/libreoffice/master/package/source/xstor/xstorage.cxx:6253
#11 0xa2f5cd3b in oox::docprop::(anonymous namespace)::lclGetRelatedStreams (rxStorage=..., rStreamType=...)
    at /ssd/opt/libreoffice/master/oox/source/docprop/ooxmldocpropimport.cxx:89
#12 0xa2f5d184 in oox::docprop::DocumentPropertiesImport::importProperties (this=0xad630368, rxSource=..., rxDocumentProperties=...)
    at /ssd/opt/libreoffice/master/oox/source/docprop/ooxmldocpropimport.cxx:155
#13 0xa096b8bc in writerfilter::dmapper::DomainMapper::DomainMapper (this=0x9540b90, xContext=..., xInputStream=..., xModel=..., eDocumentType=
    writerfilter::dmapper::DOCUMENT_OOXML) at /ssd/opt/libreoffice/master/writerfilter/source/dmapper/DomainMapper.cxx:117
#14 0xa09db041 in WriterFilter::filter (this=0xa2de4d14, aDescriptor=...)
    at /ssd/opt/libreoffice/master/writerfilter/source/filter/ImportFilter.cxx:104

which I imagine is what causes the ultimate General Error dialog:

(gdb) p rEntry
$4 = (ZipEntry &) @0xad636418: {nVersion = 20, nFlag = 6, nMethod = 8, nTime = 1083022683, nCrc = 2030890763, nCompressedSize = 356, nSize = 
    396, nOffset = 12688, nPathLen = 17, nExtraLen = 0, sPath = {pData = 0xa2deef44}}


693	        bBroken = rEntry.nVersion != nVersion
694	                        || (rEntry.nFlag & ~6L) != (nFlag & ~6L)
695	                        || rEntry.nTime != nTime
696	                        || rEntry.nPathLen != nPathLen
697	                        || !rEntry.sPath.equals( sLOCPath );

(gdb) p rEntry.nVersion
$5 = 20
(gdb) p nVersion
$6 = 20
(gdb) p rEntry.nFlag & ~6L
$7 = 0
(gdb) p nFlag & ~6L
$9 = 0
(gdb) p rEntry.nTime
$10 = 1083022683
(gdb) p nTime
$11 = 1083088142
(gdb) p rEntry.nPathLen 
$12 = 17
(gdb) p nPathLen
$13 = 17

So - seems like it has a different time stamp: odd ...
Comment 6 Michael Meeks 2012-09-21 09:26:02 UTC
So - why would the directory timestamp differ from the stream header:

1083022683 = Mon, 26 Apr 2004 23:38:03 GMT
1083088142 = Tue, 27 Apr 2004 17:49:02 GMT

As an immediate workaround, unzipping and re-zipping the file works fine :-)

The question would be: how was this .docx produced ? and/or damaged.

Secondly - it looks like we don't re-try loading with a "this file is damaged" prompt and being more tolerant as/when we hit this sort of error for .docx.

I guess that needs fixing too.
Comment 7 Michael Meeks 2012-09-21 15:43:29 UTC
*** Bug 45207 has been marked as a duplicate of this bug. ***
Comment 8 Michael Meeks 2012-09-21 15:47:35 UTC
*** Bug 54968 has been marked as a duplicate of this bug. ***
Comment 9 Michael Meeks 2012-09-21 15:55:29 UTC
bug#54609 is a band-aid for basically the same issue as this - but of course the band-aid only works for some files.
Comment 10 Michael Meeks 2012-09-21 17:29:28 UTC
Created attachment 67516 [details]
debugging patch

Attached patch allows the document to load by first detecting the zip exception and returning the right error - so we can get repair mode turned on.

However - I then force repair-mode on - since there seems to be no way to force it down through the domain-mapper & associated logic.

We get the flag set right coming into:

Breakpoint 1, WriterFilter::filter (this=0xaca5fa00, aDescriptor=uno::Sequence of length 13 = {...})
    at /ssd/opt/libreoffice/master/writerfilter/source/filter/ImportFilter.cxx:50
...
{Name = "RepairPackage", Handle = 0, Value = uno::Any 1 '\001', State = 
    com::sun::star::beans::PropertyState_DIRECT_VALUE}

But that needs pushing down.

End goal: throw up a dialog, offering to repair, and import the file anyway. I can at least see the contents now with that hard-coded.
Comment 11 xing 2012-09-21 17:45:39 UTC
This is great news that this bug is traced and squashed.  

Mike, for your proposed end goal of "throw(ing) up a dialog" might be a problem for some that use the uno or cli component for file conversion where GUI popup dialog interaction is not feasible in a --headless environment. 

Perhaps the default should be forced-repair as your patch currently has or only popup repair dailog when "--headless" is not enabled and force-repair otherwise.
Comment 12 Michael Meeks 2012-09-21 20:06:09 UTC
The bug is not yet fixed; this is a prototype patch. I still really want to know *why* these documents have inconsistent file / time-stamps in them, that's really unclear to me.

Xing - where did this document come from ? and/or how was it made ? - can you find that out ?
Comment 13 xing 2012-09-21 20:24:23 UTC
Just emailed the original user of this test file for more information but the chance of response is very low. However and hopefully with some luck I will try to find another test-case/subject over the next few days.
Comment 14 Michael Meeks 2012-09-21 20:37:55 UTC
pushed a fix to master, I'd appreciate widespread testing - it should complain the file is broken then allow it to be 'repaired' (ie. a sloppier more accepting import).

Unlikely to make 3.6.2 - perhaps (with some feedback) into 3.6.3 :-)
Comment 15 Not Assigned 2012-09-21 20:38:56 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=ff300e59e74ee88aa6a4981b57a51af416c9e991

fdo#49819 - allow slightly inconsistent docx files to be repaired



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 16 Not Assigned 2012-09-24 07:28:13 UTC
Fridrich Å trba committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=5db7ac239278634c39cbb15f0173db0524b5dcd6

fdo#49819, fdo#54609: Do not consider timestamp differences as corruption



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 17 Not Assigned 2012-09-24 08:31:46 UTC
Fridrich Å trba committed a patch related to this issue.
It has been pushed to "libreoffice-3-6-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=736b9ee7bdd5f9fd0a65a7ab3d9ae3c283007f09&g=libreoffice-3-6-2

fdo#49819, fdo#54609: Do not consider timestamp differences as corruption


It will be available already in LibreOffice 3.6.2.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 18 Not Assigned 2012-09-24 08:32:07 UTC
Fridrich Å trba committed a patch related to this issue.
It has been pushed to "libreoffice-3-6":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=afb9212cd39efcabd8a2f444d2f2979abb325a6a&g=libreoffice-3-6

fdo#49819, fdo#54609: Do not consider timestamp differences as corruption


It will be available in LibreOffice 3.6.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 19 xing 2012-09-26 05:13:05 UTC
This corrupted docx file was created on Windows using Microsoft Word 2010. The user didn't provide much info from our feedback request.
Comment 20 Michael Meeks 2012-09-26 08:51:51 UTC
Marking fixed, as it is fixed ;-)

I guess Office 2010 is just producing bad .zip output - which is a shame.

Thanks for the pointer :-)
Comment 21 Harri Pitkänen 2012-09-28 17:24:21 UTC
*** Bug 44853 has been marked as a duplicate of this bug. ***
Comment 22 Terrence Enger 2012-10-02 02:40:22 UTC
With master cc1a112 pulled 2012-10-01, the problem is indeed fixed.

Not that there was much doubt after Michael's assurance, but what else
would I do with my just-completed build? <grin />
Comment 23 Terrence Enger 2014-03-07 21:38:15 UTC
Markus,

Are you sure about the mime type you assigned to the attachment
"Chapter 2 - Pink Ball, Knight & Penguin.docx"?  `file` reports
"Microsoft Word 2007+".

Terry.