Created attachment 121217 [details]
This file is reported as corrupted
From time to time I get a .docx file that LibreOffice reports as corrupted and refuses to open, the file opens fine in MS Office.
I thought I'd take the time to submit one for debugging.
It is only half a page of text, A quick look at the inside showed no problems to me (can unzip it and open each entry inside with a text editor)
I can confirm with Version: 126.96.36.199.alpha0+
Build ID: de9d0e797903e7ecc19be2b05c7e89d5936ae02d
Threads 4; Ver: Linux 4.2; Render: default;
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2015-12-03_04:13:00
from command line:
:1: parser error : Document is empty
I can open file with word 2010
The filter detection in oox/source/core/filterdetect.cxx tries to parse the stream in "_rels/.rels" but cannot open it ("_rels" )
aParser.parseStream( aZipStorage, "_rels/.rels" );
Unzipping + rezipping the docx fixes the problem.
@caolan: was this area (apart from crashes) also of your interest?
Build ID: a9f56091b6422ec8c42f09b8472200ae4ab12548
CPU Threads: 4; OS Version: Windows 6.19; UI Render: default;
TinderBox: Win-x86@42, Branch:master, Time: 2016-12-05_23:12:26
Locale: nl-NL (nl_NL); Calc: CL
This DOCX is not correct and is corrupted. Not only LO but some other programs refuse to open it, complaining on unzip error, as Oliver noted. Even "Open XML SDK 2.5 Productivity Tool" (http://www.microsoft.com/en-us/download/details.aspx?id=30425).
Looks like it's 2007. Some MSO bug? Saved in MSO again, opens fine.
Is bug valid? Maybe, if MSO has workaround, LO might also have it. But this bug was confirmed too soon, without a decision whether it's worth fixing.
Please don't change this issue just like that. Renamed to ZIP it unzips just fine .
I get documents like this at a regular interval and it is an annoying interoperability issue, I cannot rely on LO to open docs I get.
If it's a quirk of MSO and LO wants to claim operability, it should be able to open the file.
$ unzip failing_doc.docx.zip
warning: failing_doc.docx.zip appears to use backslashes as path separators
$ ls -R -l
-rw-r--r-- 1 peter peter 1576 Jan 1 1980 [Content_Types].xml
drwxr-xr-x 2 peter peter 4096 Jul 18 19:56 docProps
-rw-r--r-- 1 peter peter 9547 Jul 18 19:49 failing_doc.docx.zip
drwxr-xr-x 2 peter peter 4096 Jul 18 19:56 _rels
drwxr-xr-x 4 peter peter 4096 Jul 18 19:56 word
-rw-r--r-- 1 peter peter 685 Dec 8 2015 app.xml
-rw-r--r-- 1 peter peter 619 Dec 8 2015 core.xml
-rw-r--r-- 1 peter peter 4188 Dec 8 2015 document.xml
-rw-r--r-- 1 peter peter 971 Dec 8 2015 endnotes.xml
-rw-r--r-- 1 peter peter 1595 Jan 1 1980 fontTable.xml
-rw-r--r-- 1 peter peter 977 Dec 8 2015 footnotes.xml
drwxr-xr-x 2 peter peter 4096 Jul 18 19:56 _rels
-rw-r--r-- 1 peter peter 2440 Jan 1 1980 settings.xml
-rw-r--r-- 1 peter peter 16648 Jan 1 1980 styles.xml
drwxr-xr-x 2 peter peter 4096 Jul 18 19:56 theme
-rw-r--r-- 1 peter peter 260 Jan 1 1980 webSettings.xml
-rw-r--r-- 1 peter peter 1081 Jan 1 1980 document.xml.rels
-rw-r--r-- 1 peter peter 6999 Jan 1 1980 theme1.xml
Most probably, the problem is related to the version of package (ZIP) - APPNOTE-2.0 - is wrong as per ECMA-376 and ISO/IEC 29500, which mandate that OOXML package as per PKWARE Inc. Zip APPNOTE Version 6.2.0. Why is that so, i.e. was there some repacking happening on the route from source to destination, or if generating software (that is claimed to be MS Word in app.xml content) makes that under some circumstances, is unclear.
I suppose that LO *could* allow such packages. But please note that trying to mimic any non-standard behavior of MS Word (and be bug-to-bug compatible with it) is generally not in LO goals list.
I wonder what is wrong with following Postel's law...
Your comments are irrelevant (as this one also is).
No one has refused to fix it yet; and prior comments were intended (mostly) to nail the problem down.
However, "what's wrong" with cited law is that 1) we need more man-power to implement whatever might fit into that law; and 2) we take (unknown) treats by allowing non-standard formats, without understanding the standard's reasoning behind mandating that specific package version.
** Please read this message in its entirety before responding **
To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.
There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.
If you have time, please do the following:
Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/
If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.
Please DO NOT
Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not
appropriate in this case)
If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from http://downloadarchive.documentfoundation.org/libreoffice/old/
2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword
Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa
Thank you for helping us make LibreOffice even better for everyone!
Repro 6.2+. LO asks if it should repair the file, but fails.
If it's a corrupted docx, what about a WONTFIX?
OP here... since less and less people use the MSO version that had this specific quirk, be my guest and close this. 4 years waiting has been enough anyway.
And for the last time, it is not corrupt, you can unzip the file perfectly. It merely doesn't follow the standard.
I am removing myself of this thread and community
It was just a suggestion.
uncc myself since I can't do anything about this.
I have an identical bug for xlsx files. I didn't create those files, but they seem to be by this software:
Bio-Rad CFX Maestro 1.1 Version 4.1.2433.1219
It looks like this is problem in the zip implementation which was used to create those OfficeOpenXml files. The delimiter for folders is being stored as backslash \ instead of a slash /. And although a slash seems to the default folder delimiter for zip files, Microsoft products open those zip files flawlessly.
The Windows-Explorer (Windows 7) is able to extract those files. And the Microsoft OneDrive online-office can also open them.
You can actually take a OfficeOpenXML file created by LibreOffice, extract it on Linux and convert it into such a messed up file. Just rename and convert all slashes (directories) to backslashes.
mv _rels/.rels _rels\\.rels
Repeat for all files, delete the empty folders, repack the zip, rename to docx/xlsx and upload to OneDrive.
In the end, I think this shouldn't be hard to fix. Especially because there shouldn't be a legal case for "real" backslashes inside filenames, inside OfficeOpenXML files.
So LibreOffice can simply interpret all backslashes inside filenames as slashes.
Note: I also opened a ticket for 7-Zip, to see what those zip experts say.
About "\", I had proposed a patch here: https://bugs.documentfoundation.org/show_bug.cgi?id=97379#c9
I just wonder if we should be strict when writing but also when reading zips or should we be strict only when reading zips.
Also, perhaps the other apps should just read the standard and follow it.
1. The file uses backslashes as file name separator:
$ zipinfo /home/suokunlong/下载/tmp/failing_doc.docx
Zip file size: 9547 bytes, number of entries: 13
-rw---- 2.0 fat 1576 b- defN 80-Jan-01 00:00 [Content_Types].xml
-rw---- 2.0 fat 685 b- defN 15-Dec-08 10:52 docProps\app.xml
-rw---- 2.0 fat 619 b- defN 15-Dec-08 10:52 docProps\core.xml
-rw---- 2.0 fat 4188 b- defN 15-Dec-08 10:52 word\document.xml
-rw---- 2.0 fat 971 b- defN 15-Dec-08 10:52 word\endnotes.xml
-rw---- 2.0 fat 1595 b- defN 80-Jan-01 00:00 word\fontTable.xml
-rw---- 2.0 fat 977 b- defN 15-Dec-08 10:52 word\footnotes.xml
-rw---- 2.0 fat 2440 b- defN 80-Jan-01 00:00 word\settings.xml
-rw---- 2.0 fat 16648 b- defN 80-Jan-01 00:00 word\styles.xml
-rw---- 2.0 fat 260 b- defN 80-Jan-01 00:00 word\webSettings.xml
-rw---- 2.0 fat 6999 b- defN 80-Jan-01 00:00 word\theme\theme1.xml
-rw---- 2.0 fat 1081 b- defN 80-Jan-01 00:00 word\_rels\document.xml.rels
-rw---- 2.0 fat 590 b- defN 80-Jan-01 00:00 _rels\.rels
13 files, 38629 bytes uncompressed, 8069 bytes compressed: 79.1%
2. Backslash is not allowed by PK ZIP Specs:
4.4.17 file name: (Variable)
188.8.131.52 The name of the file, with optional relative path.
The path stored MUST NOT contain a drive or
device letter, or a leading slash. All slashes
MUST be forward slashes '/' as opposed to
backwards slashes '\' for compatibility with Amiga
and UNIX file systems etc. If input came from standard
input, there is no file name field.
3. Actually a lot of third-party software still uses backslashes. See e.g. bug 76115 (which has a duplicate bug 131575).
This bug is for docx, bug 76115 is for xlsx. But I think they use the same package/source/zippackage code. For bug triaging purpose, should this be marked as a duplicate of bug 76115?
*** Bug 97379 has been marked as a duplicate of this bug. ***
As explained in https://bugs.documentfoundation.org/show_bug.cgi?id=97379#c9
the code pointer would be in function OStorageHelper::IsValidZipEntryFileName()