Bug 96401 - FILEOPEN: DOCX - Specific file reported as corrupted (openable in MSO but not in other programs because of unzip error, backslash "\" as filename separator)
Summary: FILEOPEN: DOCX - Specific file reported as corrupted (openable in MSO but not...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected)
Inherited From OOo
Hardware: All All
: lowest normal
Assignee: Not Assigned
Whiteboard: interoperability
Keywords: filter:docx
: 97379 (view as bug list)
Depends on:
Blocks: DOCX-Opening
  Show dependency treegraph
Reported: 2015-12-11 07:51 UTC by petur
Modified: 2021-11-04 09:11 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:

This file is reported as corrupted (9.32 KB, application/vnd.ms-word.document.12)
2015-12-11 07:51 UTC, petur

Note You need to log in before you can comment on or make changes to this bug.
Description petur 2015-12-11 07:51:35 UTC
Created attachment 121217 [details]
This file is reported as corrupted

From time to time I get a .docx file that LibreOffice reports as corrupted and refuses to open, the file opens fine in MS Office.

I thought I'd take the time to submit one for debugging.

It is only half a page of text, A quick look at the inside showed no problems to me (can unzip it and open each entry inside with a text editor)
Comment 1 raal 2015-12-11 17:38:46 UTC
I can confirm with Version:
Build ID: de9d0e797903e7ecc19be2b05c7e89d5936ae02d
Threads 4; Ver: Linux 4.2; Render: default; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2015-12-03_04:13:00

from command line:
:1: parser error : Document is empty

I can open file with word 2010
Comment 2 Oliver Specht (CIB) 2015-12-16 15:00:05 UTC
The filter detection in oox/source/core/filterdetect.cxx tries to parse the stream in "_rels/.rels" but cannot open it ("_rels" )

aParser.parseStream( aZipStorage, "_rels/.rels" );

Unzipping + rezipping the docx fixes the problem.
Comment 3 Cor Nouws 2016-09-06 09:59:57 UTC Comment hidden (obsolete)
Comment 4 Telesto 2016-12-07 19:48:02 UTC
Confirming with:
Build ID: a9f56091b6422ec8c42f09b8472200ae4ab12548
CPU Threads: 4; OS Version: Windows 6.19; UI Render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2016-12-05_23:12:26
Locale: nl-NL (nl_NL); Calc: CL
Comment 5 Timur 2017-07-18 17:29:51 UTC
This DOCX is not correct and is corrupted. Not only LO but some other programs refuse to open it, complaining on unzip error, as Oliver noted. Even "Open XML SDK 2.5 Productivity Tool" (http://www.microsoft.com/en-us/download/details.aspx?id=30425). 
Looks like it's 2007. Some MSO bug? Saved in MSO again, opens fine. 
Is bug valid? Maybe, if MSO has workaround, LO might also have it. But this bug was confirmed too soon, without a decision whether it's worth fixing.
Comment 6 petur 2017-07-18 18:00:08 UTC Comment hidden (no-value)
Comment 7 Mike Kaganski 2017-07-18 18:43:53 UTC
Most probably, the problem is related to the version of package (ZIP) - APPNOTE-2.0 - is wrong as per ECMA-376 and ISO/IEC 29500, which mandate that OOXML package as per PKWARE Inc. Zip APPNOTE Version 6.2.0. Why is that so, i.e. was there some repacking happening on the route from source to destination, or if generating software (that is claimed to be MS Word in app.xml content) makes that under some circumstances, is unclear.

I suppose that LO *could* allow such packages. But please note that trying to mimic any non-standard behavior of MS Word (and be bug-to-bug compatible with it) is generally not in LO goals list.
Comment 8 petur 2017-07-18 19:02:05 UTC Comment hidden (off-topic)
Comment 9 Mike Kaganski 2017-07-18 19:07:14 UTC Comment hidden (off-topic)
Comment 10 QA Administrators 2018-07-19 02:41:32 UTC Comment hidden (obsolete)
Comment 11 Timur 2018-07-19 08:32:45 UTC
Repro 6.2+. LO asks if it should repair the file, but fails.
Comment 12 Julien Nabet 2020-01-01 15:18:09 UTC Comment hidden (obsolete)
Comment 13 petur 2020-01-01 16:08:27 UTC
OP here... since less and less people use the MSO version that had this specific quirk, be my guest and close this. 4 years waiting has been enough anyway.

And for the last time, it is not corrupt, you can unzip the file perfectly. It merely doesn't follow the standard.

I am removing myself of this thread and community
Comment 14 Julien Nabet 2020-01-01 16:17:42 UTC Comment hidden (obsolete)
Comment 15 md-work 2020-02-10 17:06:55 UTC
I have an identical bug for xlsx files. I didn't create those files, but they seem to be by this software:
Bio-Rad CFX Maestro 1.1 Version 4.1.2433.1219

It looks like this is problem in the zip implementation which was used to create those OfficeOpenXml files. The delimiter for folders is being stored as backslash \ instead of a slash /. And although a slash seems to the default folder delimiter for zip files, Microsoft products open those zip files flawlessly.

The Windows-Explorer (Windows 7) is able to extract those files. And the Microsoft OneDrive online-office can also open them.

You can actually take a OfficeOpenXML file created by LibreOffice, extract it on Linux and convert it into such a messed up file. Just rename and convert all slashes (directories) to backslashes.
mv _rels/.rels _rels\\.rels
Repeat for all files, delete the empty folders, repack the zip, rename to docx/xlsx and upload to OneDrive.

In the end, I think this shouldn't be hard to fix. Especially because there shouldn't be a legal case for "real" backslashes inside filenames, inside OfficeOpenXML files.
So LibreOffice can simply interpret all backslashes inside filenames as slashes.

Note: I also opened a ticket for 7-Zip, to see what those zip experts say.
Comment 16 Julien Nabet 2020-02-28 09:58:01 UTC
About "\", I had proposed a patch here: https://bugs.documentfoundation.org/show_bug.cgi?id=97379#c9
I just wonder if we should be strict when writing but also when reading zips or should we be strict only when reading zips.
Also, perhaps the other apps should just read the standard and follow it.
Comment 17 Kevin Suo 2021-11-04 08:58:43 UTC
1. The file uses backslashes as file name separator:

$ zipinfo /home/suokunlong/下载/tmp/failing_doc.docx 
Archive:  /home/suokunlong/下载/tmp/failing_doc.docx
Zip file size: 9547 bytes, number of entries: 13
-rw----     2.0 fat     1576 b- defN 80-Jan-01 00:00 [Content_Types].xml
-rw----     2.0 fat      685 b- defN 15-Dec-08 10:52 docProps\app.xml
-rw----     2.0 fat      619 b- defN 15-Dec-08 10:52 docProps\core.xml
-rw----     2.0 fat     4188 b- defN 15-Dec-08 10:52 word\document.xml
-rw----     2.0 fat      971 b- defN 15-Dec-08 10:52 word\endnotes.xml
-rw----     2.0 fat     1595 b- defN 80-Jan-01 00:00 word\fontTable.xml
-rw----     2.0 fat      977 b- defN 15-Dec-08 10:52 word\footnotes.xml
-rw----     2.0 fat     2440 b- defN 80-Jan-01 00:00 word\settings.xml
-rw----     2.0 fat    16648 b- defN 80-Jan-01 00:00 word\styles.xml
-rw----     2.0 fat      260 b- defN 80-Jan-01 00:00 word\webSettings.xml
-rw----     2.0 fat     6999 b- defN 80-Jan-01 00:00 word\theme\theme1.xml
-rw----     2.0 fat     1081 b- defN 80-Jan-01 00:00 word\_rels\document.xml.rels
-rw----     2.0 fat      590 b- defN 80-Jan-01 00:00 _rels\.rels
13 files, 38629 bytes uncompressed, 8069 bytes compressed:  79.1%

2. Backslash is not allowed by PK ZIP Specs:

   4.4.17 file name: (Variable) The name of the file, with optional relative path.
       The path stored MUST NOT contain a drive or
       device letter, or a leading slash.  All slashes
       MUST be forward slashes '/' as opposed to
       backwards slashes '\' for compatibility with Amiga
       and UNIX file systems etc.  If input came from standard
       input, there is no file name field.

3. Actually a lot of third-party software still uses backslashes. See e.g. bug 76115 (which has a duplicate bug 131575).

This bug is for docx, bug 76115 is for xlsx. But I think they use the same package/source/zippackage code. For bug triaging purpose, should this be marked as a duplicate of bug 76115?
Comment 18 Kevin Suo 2021-11-04 09:06:00 UTC
*** Bug 97379 has been marked as a duplicate of this bug. ***
Comment 19 Kevin Suo 2021-11-04 09:11:26 UTC
As explained in https://bugs.documentfoundation.org/show_bug.cgi?id=97379#c9
the code pointer would be in function OStorageHelper::IsValidZipEntryFileName()
in comphelper/source/misc/storagehelper.cxx:536