Created attachment 61470 [details] a user submitted docx file We have attempted to load this particular .docx file on both 3.5.3 and 3.5.1rc1 on Windows 7 64bit and Centos 6 64bit and on both systems we are not able to load the file with libreoffice. The error is just a input/output popup error dialog. The file appears to be properly zipped and of good structure so not sure where it went wrong during the loading process. Expected result: .docx file loads. Actual result: error popup.
Confirmed with LOdev 3.6 (2012-05-10) version 3.6.0alpha0+ (Build ID: 9980e69) and LibO 3.4.5 on Windows Vista 64.
[REPRODUCIBLE] 3.5.3.2 Windows XP, show error popup. Version field should be the earliest one with problem. <http://wiki.documentfoundation.org/BugReport_Details#Version> So, change to 3.4.5 per comment 1.
Same problem with version 3.3.4 (tested under Windows XP).
Increasing the priority of this ticket as it is high priority with bug confirmations and reproducible test file.
Looks like we detect a problem with the zip file: (gdb) bt 15 #0 __cxxabiv1::__cxa_throw (obj=0x95443b8, tinfo=0xb06c5438 <typeinfo for com::sun::star::packages::zip::ZipIOException>, dest= 0xb0689dc6 <com::sun::star::packages::zip::ZipIOException::~ZipIOException()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:63 #1 0xb068abab in ZipFile::readLOC (this=0x9545670, rEntry=...) at /ssd/opt/libreoffice/master/package/source/zipapi/ZipFile.cxx:706 704 if ( bBroken && !bRecoveryMode ) 705 throw ZipIOException("The stream seems to be broken!", 706 uno::Reference< XInterface >() ); #2 0xb068c05e in ZipFile::getDataStream (this=0x9545670, rEntry=..., rData=..., bIsEncrypted=0 '\000', aMutexHolder=...) at /ssd/opt/libreoffice/master/package/source/zipapi/ZipFile.cxx:577 #3 0xb06a9c0f in ZipPackageStream::getDataStream (this=0xad6363e0) at /ssd/opt/libreoffice/master/package/source/zippackage/ZipPackageStream.cxx:551 #4 0xac4df7b3 in OWriteStream_Impl::GetStream_Impl (this=0x9544278, nStreamMode=1, bHierarchyAccess=1 '\001') at /ssd/opt/libreoffice/master/package/source/xstor/owriteablestream.cxx:1357 #5 0xac4e2b0f in OWriteStream_Impl::GetStream (this=0x9544278, nStreamMode=1, bHierarchyAccess=1 '\001') at /ssd/opt/libreoffice/master/package/source/xstor/owriteablestream.cxx:1337 #6 0xac4fb209 in OStorage::openStreamElementByHierarchicalName (this=0xad632458, aStreamPath=..., nOpenMode=1) at /ssd/opt/libreoffice/master/package/source/xstor/xstorage.cxx:6241 #7 0xac4d3da1 in OHierarchyElement_Impl::GetStreamHierarchically (this=0xaf28ea38, nStorageMode=1, aListPath=std::vector of length 0, capacity 2, nStreamMode=1, aEncryptionData=...) at /ssd/opt/libreoffice/master/package/source/xstor/ohierarchyholder.cxx:106 #8 0xac4d404f in OHierarchyElement_Impl::GetStreamHierarchically (this=0xaf28e618, nStorageMode=1, aListPath=std::vector of length 0, capacity 2, nStreamMode=1, aEncryptionData=...) at /ssd/opt/libreoffice/master/package/source/xstor/ohierarchyholder.cxx:148 #9 0xac4d432d in OHierarchyHolder_Impl::GetStreamHierarchically (this=0xad63132c, nStorageMode=1, aListPath=std::vector of length 0, capacity 2, nStreamMode=1, aEncryptionData=...) at /ssd/opt/libreoffice/master/package/source/xstor/ohierarchyholder.cxx:42 #10 0xac4fb2bb in OStorage::openStreamElementByHierarchicalName (this=0xa2de4e04, aStreamPath=..., nOpenMode=1) at /ssd/opt/libreoffice/master/package/source/xstor/xstorage.cxx:6253 #11 0xa2f5cd3b in oox::docprop::(anonymous namespace)::lclGetRelatedStreams (rxStorage=..., rStreamType=...) at /ssd/opt/libreoffice/master/oox/source/docprop/ooxmldocpropimport.cxx:89 #12 0xa2f5d184 in oox::docprop::DocumentPropertiesImport::importProperties (this=0xad630368, rxSource=..., rxDocumentProperties=...) at /ssd/opt/libreoffice/master/oox/source/docprop/ooxmldocpropimport.cxx:155 #13 0xa096b8bc in writerfilter::dmapper::DomainMapper::DomainMapper (this=0x9540b90, xContext=..., xInputStream=..., xModel=..., eDocumentType= writerfilter::dmapper::DOCUMENT_OOXML) at /ssd/opt/libreoffice/master/writerfilter/source/dmapper/DomainMapper.cxx:117 #14 0xa09db041 in WriterFilter::filter (this=0xa2de4d14, aDescriptor=...) at /ssd/opt/libreoffice/master/writerfilter/source/filter/ImportFilter.cxx:104 which I imagine is what causes the ultimate General Error dialog: (gdb) p rEntry $4 = (ZipEntry &) @0xad636418: {nVersion = 20, nFlag = 6, nMethod = 8, nTime = 1083022683, nCrc = 2030890763, nCompressedSize = 356, nSize = 396, nOffset = 12688, nPathLen = 17, nExtraLen = 0, sPath = {pData = 0xa2deef44}} 693 bBroken = rEntry.nVersion != nVersion 694 || (rEntry.nFlag & ~6L) != (nFlag & ~6L) 695 || rEntry.nTime != nTime 696 || rEntry.nPathLen != nPathLen 697 || !rEntry.sPath.equals( sLOCPath ); (gdb) p rEntry.nVersion $5 = 20 (gdb) p nVersion $6 = 20 (gdb) p rEntry.nFlag & ~6L $7 = 0 (gdb) p nFlag & ~6L $9 = 0 (gdb) p rEntry.nTime $10 = 1083022683 (gdb) p nTime $11 = 1083088142 (gdb) p rEntry.nPathLen $12 = 17 (gdb) p nPathLen $13 = 17 So - seems like it has a different time stamp: odd ...
So - why would the directory timestamp differ from the stream header: 1083022683 = Mon, 26 Apr 2004 23:38:03 GMT 1083088142 = Tue, 27 Apr 2004 17:49:02 GMT As an immediate workaround, unzipping and re-zipping the file works fine :-) The question would be: how was this .docx produced ? and/or damaged. Secondly - it looks like we don't re-try loading with a "this file is damaged" prompt and being more tolerant as/when we hit this sort of error for .docx. I guess that needs fixing too.
*** Bug 45207 has been marked as a duplicate of this bug. ***
*** Bug 54968 has been marked as a duplicate of this bug. ***
bug#54609 is a band-aid for basically the same issue as this - but of course the band-aid only works for some files.
Created attachment 67516 [details] debugging patch Attached patch allows the document to load by first detecting the zip exception and returning the right error - so we can get repair mode turned on. However - I then force repair-mode on - since there seems to be no way to force it down through the domain-mapper & associated logic. We get the flag set right coming into: Breakpoint 1, WriterFilter::filter (this=0xaca5fa00, aDescriptor=uno::Sequence of length 13 = {...}) at /ssd/opt/libreoffice/master/writerfilter/source/filter/ImportFilter.cxx:50 ... {Name = "RepairPackage", Handle = 0, Value = uno::Any 1 '\001', State = com::sun::star::beans::PropertyState_DIRECT_VALUE} But that needs pushing down. End goal: throw up a dialog, offering to repair, and import the file anyway. I can at least see the contents now with that hard-coded.
This is great news that this bug is traced and squashed. Mike, for your proposed end goal of "throw(ing) up a dialog" might be a problem for some that use the uno or cli component for file conversion where GUI popup dialog interaction is not feasible in a --headless environment. Perhaps the default should be forced-repair as your patch currently has or only popup repair dailog when "--headless" is not enabled and force-repair otherwise.
The bug is not yet fixed; this is a prototype patch. I still really want to know *why* these documents have inconsistent file / time-stamps in them, that's really unclear to me. Xing - where did this document come from ? and/or how was it made ? - can you find that out ?
Just emailed the original user of this test file for more information but the chance of response is very low. However and hopefully with some luck I will try to find another test-case/subject over the next few days.
pushed a fix to master, I'd appreciate widespread testing - it should complain the file is broken then allow it to be 'repaired' (ie. a sloppier more accepting import). Unlikely to make 3.6.2 - perhaps (with some feedback) into 3.6.3 :-)
Michael Meeks committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=ff300e59e74ee88aa6a4981b57a51af416c9e991 fdo#49819 - allow slightly inconsistent docx files to be repaired The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Fridrich Å trba committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=5db7ac239278634c39cbb15f0173db0524b5dcd6 fdo#49819, fdo#54609: Do not consider timestamp differences as corruption The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Fridrich Å trba committed a patch related to this issue. It has been pushed to "libreoffice-3-6-2": http://cgit.freedesktop.org/libreoffice/core/commit/?id=736b9ee7bdd5f9fd0a65a7ab3d9ae3c283007f09&g=libreoffice-3-6-2 fdo#49819, fdo#54609: Do not consider timestamp differences as corruption It will be available already in LibreOffice 3.6.2. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Fridrich Å trba committed a patch related to this issue. It has been pushed to "libreoffice-3-6": http://cgit.freedesktop.org/libreoffice/core/commit/?id=afb9212cd39efcabd8a2f444d2f2979abb325a6a&g=libreoffice-3-6 fdo#49819, fdo#54609: Do not consider timestamp differences as corruption It will be available in LibreOffice 3.6.3. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
This corrupted docx file was created on Windows using Microsoft Word 2010. The user didn't provide much info from our feedback request.
Marking fixed, as it is fixed ;-) I guess Office 2010 is just producing bad .zip output - which is a shame. Thanks for the pointer :-)
*** Bug 44853 has been marked as a duplicate of this bug. ***
With master cc1a112 pulled 2012-10-01, the problem is indeed fixed. Not that there was much doubt after Michael's assurance, but what else would I do with my just-completed build? <grin />
Markus, Are you sure about the mime type you assigned to the attachment "Chapter 2 - Pink Ball, Knight & Penguin.docx"? `file` reports "Microsoft Word 2007+". Terry.