Created attachment 121217 [details] This file is reported as corrupted From time to time I get a .docx file that LibreOffice reports as corrupted and refuses to open, the file opens fine in MS Office. I thought I'd take the time to submit one for debugging. It is only half a page of text, A quick look at the inside showed no problems to me (can unzip it and open each entry inside with a text editor)
I can confirm with Version: 5.2.0.0.alpha0+ Build ID: de9d0e797903e7ecc19be2b05c7e89d5936ae02d Threads 4; Ver: Linux 4.2; Render: default; TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2015-12-03_04:13:00 and 4.4.0.0.alpha2+ from command line: :1: parser error : Document is empty PK ^ I can open file with word 2010
The filter detection in oox/source/core/filterdetect.cxx tries to parse the stream in "_rels/.rels" but cannot open it ("_rels" ) aParser.parseStream( aZipStorage, "_rels/.rels" ); Unzipping + rezipping the docx fixes the problem.
@caolan: was this area (apart from crashes) also of your interest?
Confirming with: Version: 5.4.0.0.alpha0+ Build ID: a9f56091b6422ec8c42f09b8472200ae4ab12548 CPU Threads: 4; OS Version: Windows 6.19; UI Render: default; TinderBox: Win-x86@42, Branch:master, Time: 2016-12-05_23:12:26 Locale: nl-NL (nl_NL); Calc: CL
This DOCX is not correct and is corrupted. Not only LO but some other programs refuse to open it, complaining on unzip error, as Oliver noted. Even "Open XML SDK 2.5 Productivity Tool" (http://www.microsoft.com/en-us/download/details.aspx?id=30425). Looks like it's 2007. Some MSO bug? Saved in MSO again, opens fine. Is bug valid? Maybe, if MSO has workaround, LO might also have it. But this bug was confirmed too soon, without a decision whether it's worth fixing.
Please don't change this issue just like that. Renamed to ZIP it unzips just fine [1]. I get documents like this at a regular interval and it is an annoying interoperability issue, I cannot rely on LO to open docs I get. If it's a quirk of MSO and LO wants to claim operability, it should be able to open the file. [1] $ unzip failing_doc.docx.zip Archive: failing_doc.docx.zip inflating: [Content_Types].xml warning: failing_doc.docx.zip appears to use backslashes as path separators inflating: docProps/app.xml inflating: docProps/core.xml inflating: word/document.xml inflating: word/endnotes.xml inflating: word/fontTable.xml inflating: word/footnotes.xml inflating: word/settings.xml inflating: word/styles.xml inflating: word/webSettings.xml inflating: word/theme/theme1.xml inflating: word/_rels/document.xml.rels inflating: _rels/.rels $ ls -R -l .: total 28 -rw-r--r-- 1 peter peter 1576 Jan 1 1980 [Content_Types].xml drwxr-xr-x 2 peter peter 4096 Jul 18 19:56 docProps -rw-r--r-- 1 peter peter 9547 Jul 18 19:49 failing_doc.docx.zip drwxr-xr-x 2 peter peter 4096 Jul 18 19:56 _rels drwxr-xr-x 4 peter peter 4096 Jul 18 19:56 word ./docProps: total 8 -rw-r--r-- 1 peter peter 685 Dec 8 2015 app.xml -rw-r--r-- 1 peter peter 619 Dec 8 2015 core.xml ./_rels: total 0 ./word: total 56 -rw-r--r-- 1 peter peter 4188 Dec 8 2015 document.xml -rw-r--r-- 1 peter peter 971 Dec 8 2015 endnotes.xml -rw-r--r-- 1 peter peter 1595 Jan 1 1980 fontTable.xml -rw-r--r-- 1 peter peter 977 Dec 8 2015 footnotes.xml drwxr-xr-x 2 peter peter 4096 Jul 18 19:56 _rels -rw-r--r-- 1 peter peter 2440 Jan 1 1980 settings.xml -rw-r--r-- 1 peter peter 16648 Jan 1 1980 styles.xml drwxr-xr-x 2 peter peter 4096 Jul 18 19:56 theme -rw-r--r-- 1 peter peter 260 Jan 1 1980 webSettings.xml ./word/_rels: total 4 -rw-r--r-- 1 peter peter 1081 Jan 1 1980 document.xml.rels ./word/theme: total 8 -rw-r--r-- 1 peter peter 6999 Jan 1 1980 theme1.xml peter@animal:/media/PCdata/temp/failing_doc.docx$
Most probably, the problem is related to the version of package (ZIP) - APPNOTE-2.0 - is wrong as per ECMA-376 and ISO/IEC 29500, which mandate that OOXML package as per PKWARE Inc. Zip APPNOTE Version 6.2.0. Why is that so, i.e. was there some repacking happening on the route from source to destination, or if generating software (that is claimed to be MS Word in app.xml content) makes that under some circumstances, is unclear. I suppose that LO *could* allow such packages. But please note that trying to mimic any non-standard behavior of MS Word (and be bug-to-bug compatible with it) is generally not in LO goals list.
I wonder what is wrong with following Postel's law...
Your comments are irrelevant (as this one also is). No one has refused to fix it yet; and prior comments were intended (mostly) to nail the problem down. However, "what's wrong" with cited law is that 1) we need more man-power to implement whatever might fit into that law; and 2) we take (unknown) treats by allowing non-standard formats, without understanding the standard's reasoning behind mandating that specific package version.
** Please read this message in its entirety before responding ** To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
Repro 6.2+. LO asks if it should repair the file, but fails.
If it's a corrupted docx, what about a WONTFIX?
OP here... since less and less people use the MSO version that had this specific quirk, be my guest and close this. 4 years waiting has been enough anyway. And for the last time, it is not corrupt, you can unzip the file perfectly. It merely doesn't follow the standard. I am removing myself of this thread and community
It was just a suggestion. uncc myself since I can't do anything about this.
I have an identical bug for xlsx files. I didn't create those files, but they seem to be by this software: Bio-Rad CFX Maestro 1.1 Version 4.1.2433.1219 It looks like this is problem in the zip implementation which was used to create those OfficeOpenXml files. The delimiter for folders is being stored as backslash \ instead of a slash /. And although a slash seems to the default folder delimiter for zip files, Microsoft products open those zip files flawlessly. The Windows-Explorer (Windows 7) is able to extract those files. And the Microsoft OneDrive online-office can also open them. https://onedrive.live.com You can actually take a OfficeOpenXML file created by LibreOffice, extract it on Linux and convert it into such a messed up file. Just rename and convert all slashes (directories) to backslashes. mv _rels/.rels _rels\\.rels Repeat for all files, delete the empty folders, repack the zip, rename to docx/xlsx and upload to OneDrive. In the end, I think this shouldn't be hard to fix. Especially because there shouldn't be a legal case for "real" backslashes inside filenames, inside OfficeOpenXML files. So LibreOffice can simply interpret all backslashes inside filenames as slashes. Note: I also opened a ticket for 7-Zip, to see what those zip experts say. https://sourceforge.net/p/p7zip/bugs/227/
About "\", I had proposed a patch here: https://bugs.documentfoundation.org/show_bug.cgi?id=97379#c9 I just wonder if we should be strict when writing but also when reading zips or should we be strict only when reading zips. Also, perhaps the other apps should just read the standard and follow it.
1. The file uses backslashes as file name separator: $ zipinfo /home/suokunlong/下载/tmp/failing_doc.docx Archive: /home/suokunlong/下载/tmp/failing_doc.docx Zip file size: 9547 bytes, number of entries: 13 -rw---- 2.0 fat 1576 b- defN 80-Jan-01 00:00 [Content_Types].xml -rw---- 2.0 fat 685 b- defN 15-Dec-08 10:52 docProps\app.xml -rw---- 2.0 fat 619 b- defN 15-Dec-08 10:52 docProps\core.xml -rw---- 2.0 fat 4188 b- defN 15-Dec-08 10:52 word\document.xml -rw---- 2.0 fat 971 b- defN 15-Dec-08 10:52 word\endnotes.xml -rw---- 2.0 fat 1595 b- defN 80-Jan-01 00:00 word\fontTable.xml -rw---- 2.0 fat 977 b- defN 15-Dec-08 10:52 word\footnotes.xml -rw---- 2.0 fat 2440 b- defN 80-Jan-01 00:00 word\settings.xml -rw---- 2.0 fat 16648 b- defN 80-Jan-01 00:00 word\styles.xml -rw---- 2.0 fat 260 b- defN 80-Jan-01 00:00 word\webSettings.xml -rw---- 2.0 fat 6999 b- defN 80-Jan-01 00:00 word\theme\theme1.xml -rw---- 2.0 fat 1081 b- defN 80-Jan-01 00:00 word\_rels\document.xml.rels -rw---- 2.0 fat 590 b- defN 80-Jan-01 00:00 _rels\.rels 13 files, 38629 bytes uncompressed, 8069 bytes compressed: 79.1% 2. Backslash is not allowed by PK ZIP Specs: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT 4.4.17 file name: (Variable) 4.4.17.1 The name of the file, with optional relative path. The path stored MUST NOT contain a drive or device letter, or a leading slash. All slashes MUST be forward slashes '/' as opposed to backwards slashes '\' for compatibility with Amiga and UNIX file systems etc. If input came from standard input, there is no file name field. 3. Actually a lot of third-party software still uses backslashes. See e.g. bug 76115 (which has a duplicate bug 131575). This bug is for docx, bug 76115 is for xlsx. But I think they use the same package/source/zippackage code. For bug triaging purpose, should this be marked as a duplicate of bug 76115?
*** Bug 97379 has been marked as a duplicate of this bug. ***
As explained in https://bugs.documentfoundation.org/show_bug.cgi?id=97379#c9 the code pointer would be in function OStorageHelper::IsValidZipEntryFileName() in comphelper/source/misc/storagehelper.cxx:536
Dear petur, To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
As per request, just verified that the issue is still present, though I have lately not received any such problematic documents do there is a slight chance that Microsoft has changed something on their end. However, the file I originally included with this report still fails to open Tested on a fully updated Debian Sid with: Version: 7.5.8.2 (X86_64) / LibreOffice Community Build ID: 50(Build:2) CPU threads: 8; OS: Linux 6.5; UI render: default; VCL: gtk3 Locale: nl-BE (en_GB.UTF-8); UI: en-GB Debian package version: 4:7.5.8-1 Calc: threaded
Since commit fa66eeb587f11bea88ab5950ffd94aee221d6b31, there is a "recovery mode" in ZIP package, triggered by "RepairPackage" media descriptor property [1]. Since commit 426a2f22678f89706b4db474243ab27b4a4d6c06 (for #i104759#), this mode also handles the backslashes in packages (it is done explicitly to handle this problem). The missing bit is to make sure that, when such a situation is detected during the load, and a warning is shown to the user asking to try to *repair*, we don't switch to the recovery mode. [1] https://api.libreoffice.org/docs/idl/ref/servicecom_1_1sun_1_1star_1_1document_1_1MediaDescriptor.html#ab5ae6f2c9a82bcb8f006f4b46fee1691
(In reply to Mike Kaganski from comment #22) > The missing bit is to make sure that, when such a situation is detected > during the load, and a warning is shown to the user asking to try to > *repair*, we don't switch to the recovery mode. Hmm. filter/source/storagefilterdetect/filterdetect.cxx has the code to do exactly that [1]; and so, it is the wrong implementation (or a breakage) of what was implemented in 426a2f22678f89706b4db474243ab27b4a4d6c06. [1] https://opengrok.libreoffice.org/xref/core/filter/source/storagefilterdetect/filterdetect.cxx?r=b1560344#121
So: you are able to open the file, *if* you open it using an *explicitly selected* DOCX filter in the Open dialog. Why it fails when opened normally: 1. It uses a normal auto-detect procedure. 2. In it, it iterates all filters, asking each to try to detect the file. 3. In the list, DOCX filters happen to come prior to ODF ones ... 4. But DOCX filters, when encountering the ZIP error, fail silently 5. While ODF ones, when see the same ZIP error, produce the warning, and then proceed with the procedure from comment 23 - which is described by "We don't do any type detection on broken packages (f.e. because it might be impossible), so for repairing we'll use the requested type, which was detected by the flat detection" comment there. Which makes LibreOffice use an ODF filter unconditionally on this file, which finally expectedly fails elsewhere. The problem is: we need to handle the ZIP error early, and introduce the repair mode early, still keeping the autodetection with it. Because it won't help to allow DOCX filters do the same as ODF, which would then disallow broken ODF opening - DOCX would intercept them then, and the problem would be reversed.
Mike Kaganski committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/86c682273d907c77404637c89e584047de1c1099 tdf#96401: allow to detect a broken ZIP package It will be available in 24.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Mike Kaganski committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/657f98d9272dd97e4f4c6e03cce4a0fa9f526819 Related: tdf#96401 Set PROP_ASTEMPLATE for broken ZIP package It will be available in 24.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Mike Kaganski committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/93357349ff1998b41ea1ebedf09dc1cc5da316f7 Related: tdf#96401 Check ZIP magic number, to avoid false detections It will be available in 24.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Thanks Mike, I just tried 24.2.0 (dev) and it now complains about the file being corrupted, and after choosing to fix the file opens correctly. Thanks for taking care!!