Bug 128910 - DocumentFormat.OpenXml cannot open LibreOffice spreadsheet due to FileFormatException in WindowsBase.dll MS.Internal.IO.Zip.ZipIOLocalFileBlock.Validate
Summary: DocumentFormat.OpenXml cannot open LibreOffice spreadsheet due to FileFormatE...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.0.7.3 release
Hardware: All All
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Dev-related
  Show dependency treegraph
 
Reported: 2019-11-20 09:26 UTC by Alexey
Modified: 2023-05-19 14:43 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
problematic file (16.11 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2019-11-20 09:29 UTC, Alexey
Details
newer problematic file from 6.3.4 (19.06 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2019-12-30 13:38 UTC, Alexey
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey 2019-11-20 09:26:22 UTC
Description:
Hi

I'm developing an application that should process spreadsheets using OpenXml format and support LibreOffice as well. However, files coming from some LibreOffice users (<Application>LibreOffice/6.0.7.3$Linux_X86_64 LibreOffice_project/00m0$Build-3</Application>) don't always comply with the ZIP format specification, and WindowsBase.dll throws a FileFormatException from the Validate method of this class: https://referencesource.microsoft.com/#WindowsBase/Base/MS/Internal/IO/Zip/ZipIOLocalFileBlock.cs,566c718a8927377a

Most likely the exception is thrown because this check fails:
GeneralPurposeBitFlag != centralDirFileHeader.GeneralPurposeBitFlag

Looks like bit 11 is not always set correctly in the Central Directory Header.

0000011a: file: xl/sharedStrings.xml           size: 00000000/0/0 (14/808/8/0)
00000329: file: xl/worksheets/_rels/sheet1.xml.rels size: 00000000/0/0 (14/808/8/0)
0000047e: file: xl/worksheets/sheet1.xml       size: 00000000/0/0 (14/808/8/0)
00000a1f: file: xl/workbook.xml                size: 00000000/0/0 (14/808/8/0)
00000c53: file: xl/styles.xml                  size: 00000000/0/0 (14/808/8/0)
00001056: file: docProps/app.xml               size: 00000000/0/0 (14/808/8/0)

000039dc: dir: xl/sharedStrings.xml           off 0000011a, crc/size/size: 2deb184d/461/1728 (14/8/8/0)
00003a1e: dir: xl/worksheets/_rels/sheet1.xml.rels off 00000329, crc/size/size: 696c69ea/260/1069 (14/8/8/0)
00003a6f: dir: xl/worksheets/sheet1.xml       off 0000047e, crc/size/size: ea23b80c/1371/6163 (14/8/8/0)
00003ab5: dir: xl/workbook.xml                off 00000a1f, crc/size/size: ed33788b/503/878 (14/8/8/0)
00003af2: dir: xl/styles.xml                  off 00000c53, crc/size/size: 0dd2c718/687/5289 (14/8/8/0)
00003b6b: dir: docProps/app.xml               off 00001056, crc/size/size: 15642684/235/380 (14/8/8/0)

        Bit 11: Language encoding flag (EFS).  If this bit is set,
                the filename and comment fields for this file
                MUST be encoded using UTF-8. (see APPENDIX D)

https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

The file in question can be opened by WinRAR, MS Excel, etc, but not by DocumentFormat.OpenXml.dll and WindowsBase.dll. If I re-save from Excel, the updated file will open, but we are trying to avoid this proxy operation if possible.

I would be very grateful if you advised if this is easily cured by some application setting in Calc, or provided a fix.


Steps to Reproduce:
1. use DocumentFormat.OpenXml (v2.7.2) via ClosedXml (v0.9.0) as an implementation of Excel format reader in a .NET console application (targeting .NET 4.6.1 in my case)
2. try to open the file and enumerate Worksheets
3. fail/succeed depending on some minor implementation detail of the application saving the file

Actual Results:
   at MS.Internal.IO.Zip.ZipIOLocalFileBlock.Validate(String fileName, ZipIOCentralDirectoryBlock centralDir, ZipIOCentralDirectoryFileHeader centralDirFileHeader)
   at MS.Internal.IO.Zip.ZipIOLocalFileBlock.ParseRecord(BinaryReader reader, String fileName, Int64 position, ZipIOCentralDirectoryBlock centralDir, ZipIOCentralDirectoryFileHeader centralDirFileHeader)
   at MS.Internal.IO.Zip.ZipIOLocalFileBlock.SeekableLoad(ZipIOBlockManager blockManager, String fileName)
   at MS.Internal.IO.Zip.ZipIOBlockManager.LoadLocalFileBlock(String zipFileName)
   at MS.Internal.IO.Zip.ZipArchive.GetFile(String zipFileName)
   at MS.Internal.IO.Zip.ZipArchive.GetFiles()
   at System.IO.Packaging.ZipPackage.ContentTypeHelper..ctor(ZipArchive zipArchive, IgnoredItemHelper ignoredItemHelper)
   at System.IO.Packaging.ZipPackage..ctor(Stream s, FileMode mode, FileAccess access, Boolean streaming)
   at System.IO.Packaging.Package.Open(Stream stream, FileMode packageMode, FileAccess packageAccess, Boolean streaming)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.OpenCore(Stream stream, Boolean readWriteMode)
   at DocumentFormat.OpenXml.Packaging.SpreadsheetDocument.Open(Stream stream, Boolean isEditable, OpenSettings openSettings)
   at ClosedXML.Excel.XLWorkbook.LoadSheets(Stream stream) in C:\projects\closedxml\ClosedXML\Excel\XLWorkbook_Load.cs:line 47
   at ClosedXML.Excel.XLWorkbook..ctor(Stream stream) in C:\projects\closedxml\ClosedXML\Excel\XLWorkbook.cs:line 752
   at IPMS.LegalExpertise.TpcExcelImport.ExcelTpcParser.OpenWorkbook(Byte[] workbookBytes) in G:\@tfs\...\ETL\Parsers\ITpcFileParser.cs:line 240
   at ....ExcelParser.<Get...FromFile>d__17.MoveNext() in G:\@tfs\...\ETL\Parsers\ITpcFileParser.cs:line 126

Expected Results:
OpenXml should correctly open Calc files.


Reproducible: Sometimes


User Profile Reset: No



Additional Info:
You can of course consider this Minor, but please don't delay if you can give a simple advice right away or very soon. Thanks in advance, I appreciate your time and effort.
Comment 1 Alexey 2019-11-20 09:29:39 UTC
Created attachment 155967 [details]
problematic file

added the problematic file
Comment 2 Julien Nabet 2019-11-22 10:36:39 UTC
I'm not sure to understand how to reproduce the pb.
However, on Win10 with LO 6.3.3 and with master sources updated today, I could open the file.
What do you mean "enumerate Worksheets"? I see only  "Sheet1"
Comment 3 Alexey 2019-12-10 10:19:55 UTC
Hi Julien,

The problem is not opening the file in LO or MS Excel, it's opening via DocumentFormat.OpenXml. Does it work for you as well?

DocumentFormat.OpenXml.Packaging.SpreadsheetDocument.Open method consistently failed me, complaining about corrupted ZIP.
Comment 4 Xisco Faulí 2019-12-10 15:18:44 UTC
Hello Alexey,
Could you please try to reproduce it with the latest version of LibreOffice
from https://www.libreoffice.org/download/libreoffice-fresh/ ?
I have set the bug's status to 'NEEDINFO'. Please change it back to
'UNCONFIRMED' if the bug is still present in the latest version.
Comment 5 Alexey 2019-12-25 15:31:13 UTC
Same behavior with the new version

System.IO.FileFormatException: 'File contains corrupted data.'

		StackTrace	"   at MS.Internal.IO.Zip.ZipIOLocalFileBlock.Validate(String fileName, ZipIOCentralDirectoryBlock centralDir, ZipIOCentralDirectoryFileHeader centralDirFileHeader)"	string
Comment 6 Alexey 2019-12-25 15:32:04 UTC Comment hidden (obsolete)
Comment 7 Alexey 2019-12-25 15:34:08 UTC
<Application>LibreOffice/6.3.4.2$Linux_X86_64 LibreOffice_project/60da17e045e08f1793c57c00ba83cdfce946d0aa</Application>
Comment 8 Xisco Faulí 2019-12-26 12:10:16 UTC
Hi Alexey,
I don't understand what the problem is. You are trying to open the document with DocumentFormat.OpenXml ? that software was created by Microsoft. What it has to do with LibreOffice ?
Comment 9 Alexey 2019-12-27 11:27:54 UTC
LibreOffice creates documents that do not comply to the standard implementation of ZIP. It would be great if someone managed to update the implementation to comply.
Comment 11 Julien Nabet 2019-12-28 16:39:13 UTC
Alexey, do you reproduce this with xlxs files generated with recent LO?
Indeed, 6.0.7 is EOL (like 6.1 and 6.2 branches).

Here are some test with unzip (LO Debian testing package):
unzip -t zip-implementation-bug-128910.xlsx 
Archive:  zip-implementation-bug-128910.xlsx
    testing: [trash]/0000.dat         OK
file #2 (xl/sharedStrings.xml):
         mismatch between local and central GPF bit 11 ("UTF-8"),
         continuing with central flag (IsUTF8 = 0)
    testing: xl/sharedStrings.xml     OK
file #3 (xl/worksheets/_rels/sheet1.xml.rels):
         mismatch between local and central GPF bit 11 ("UTF-8"),
         continuing with central flag (IsUTF8 = 0)
    testing: xl/worksheets/_rels/sheet1.xml.rels   OK
file #4 (xl/worksheets/sheet1.xml):
         mismatch between local and central GPF bit 11 ("UTF-8"),
         continuing with central flag (IsUTF8 = 0)
    testing: xl/worksheets/sheet1.xml   OK
file #5 (xl/workbook.xml):
         mismatch between local and central GPF bit 11 ("UTF-8"),
         continuing with central flag (IsUTF8 = 0)
    testing: xl/workbook.xml          OK
file #6 (xl/styles.xml):
         mismatch between local and central GPF bit 11 ("UTF-8"),
         continuing with central flag (IsUTF8 = 0)
    testing: xl/styles.xml            OK
    testing: [trash]/0002.dat         OK
file #8 (docProps/app.xml):
         mismatch between local and central GPF bit 11 ("UTF-8"),
         continuing with central flag (IsUTF8 = 0)
    testing: docProps/app.xml         OK
    testing: [trash]/0003.dat         OK
    testing: [trash]/0004.dat         OK
    testing: customXml/item1.xml      OK
    testing: customXml/_rels/item1.xml.rels   OK
    testing: customXml/itemProps2.xml   OK
    testing: [trash]/0001.dat         OK
    testing: customXml/itemProps1.xml   OK
    testing: customXml/_rels/item2.xml.rels   OK
    testing: [trash]/0005.dat         OK
    testing: customXml/item2.xml      OK
    testing: customXml/item3.xml      OK
    testing: customXml/itemProps3.xml   OK
    testing: docProps/custom.xml      OK
    testing: customXml/_rels/item3.xml.rels   OK
    testing: _rels/.rels              OK
    testing: docProps/core.xml        OK
    testing: [Content_Types].xml      OK
    testing: xl/_rels/workbook.xml.rels   OK
At least one warning-error was detected in zip-implementation-bug-128910.xlsx.
Comment 12 Alexey 2019-12-30 13:38:55 UTC
Created attachment 156844 [details]
newer problematic file from 6.3.4

comes attached
Comment 13 Julien Nabet 2019-12-30 18:00:39 UTC
Thank you for your feedback.

I used 'unzip -t' on your last attachment and had the same warnings:
file #3 (docProps/app.xml):
         mismatch between local and central GPF bit 11 ("UTF-8"),
         continuing with central flag (IsUTF8 = 0)
    testing: docProps/app.xml         OK
file #4 (xl/styles.xml):
         mismatch between local and central GPF bit 11 ("UTF-8"),
         continuing with central flag (IsUTF8 = 0)
    testing: xl/styles.xml            OK
file #5 (xl/workbook.xml):
         mismatch between local and central GPF bit 11 ("UTF-8"),
         continuing with central flag (IsUTF8 = 0)
    testing: xl/workbook.xml          OK
    testing: [trash]/0000.dat         OK
file #7 (xl/sharedStrings.xml):
         mismatch between local and central GPF bit 11 ("UTF-8"),
         continuing with central flag (IsUTF8 = 0)
    testing: xl/sharedStrings.xml     OK
file #8 (xl/worksheets/_rels/sheet1.xml.rels):
         mismatch between local and central GPF bit 11 ("UTF-8"),
         continuing with central flag (IsUTF8 = 0)
    testing: xl/worksheets/_rels/sheet1.xml.rels   OK
file #9 (xl/worksheets/sheet1.xml):
         mismatch between local and central GPF bit 11 ("UTF-8"),
         continuing with central flag (IsUTF8 = 0)

Let's put this one to NEW.
Comment 14 Julien Nabet 2020-01-01 14:30:43 UTC
On pc Debian x86-64 with LO Debian package 6.3.4, here is a test I did:
- retrieve the problematic file
- add "test" on a cell
- resave
- rename extension file to zip
- unzip -t
=> no error and no warning.
Comment 15 Alexey 2020-01-09 07:41:48 UTC
Thank you for confirming, guys, appreciated. Looking forward to more news.