Description: Hi I'm developing an application that should process spreadsheets using OpenXml format and support LibreOffice as well. However, files coming from some LibreOffice users (<Application>LibreOffice/6.0.7.3$Linux_X86_64 LibreOffice_project/00m0$Build-3</Application>) don't always comply with the ZIP format specification, and WindowsBase.dll throws a FileFormatException from the Validate method of this class: https://referencesource.microsoft.com/#WindowsBase/Base/MS/Internal/IO/Zip/ZipIOLocalFileBlock.cs,566c718a8927377a Most likely the exception is thrown because this check fails: GeneralPurposeBitFlag != centralDirFileHeader.GeneralPurposeBitFlag Looks like bit 11 is not always set correctly in the Central Directory Header. 0000011a: file: xl/sharedStrings.xml size: 00000000/0/0 (14/808/8/0) 00000329: file: xl/worksheets/_rels/sheet1.xml.rels size: 00000000/0/0 (14/808/8/0) 0000047e: file: xl/worksheets/sheet1.xml size: 00000000/0/0 (14/808/8/0) 00000a1f: file: xl/workbook.xml size: 00000000/0/0 (14/808/8/0) 00000c53: file: xl/styles.xml size: 00000000/0/0 (14/808/8/0) 00001056: file: docProps/app.xml size: 00000000/0/0 (14/808/8/0) 000039dc: dir: xl/sharedStrings.xml off 0000011a, crc/size/size: 2deb184d/461/1728 (14/8/8/0) 00003a1e: dir: xl/worksheets/_rels/sheet1.xml.rels off 00000329, crc/size/size: 696c69ea/260/1069 (14/8/8/0) 00003a6f: dir: xl/worksheets/sheet1.xml off 0000047e, crc/size/size: ea23b80c/1371/6163 (14/8/8/0) 00003ab5: dir: xl/workbook.xml off 00000a1f, crc/size/size: ed33788b/503/878 (14/8/8/0) 00003af2: dir: xl/styles.xml off 00000c53, crc/size/size: 0dd2c718/687/5289 (14/8/8/0) 00003b6b: dir: docProps/app.xml off 00001056, crc/size/size: 15642684/235/380 (14/8/8/0) Bit 11: Language encoding flag (EFS). If this bit is set, the filename and comment fields for this file MUST be encoded using UTF-8. (see APPENDIX D) https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT The file in question can be opened by WinRAR, MS Excel, etc, but not by DocumentFormat.OpenXml.dll and WindowsBase.dll. If I re-save from Excel, the updated file will open, but we are trying to avoid this proxy operation if possible. I would be very grateful if you advised if this is easily cured by some application setting in Calc, or provided a fix. Steps to Reproduce: 1. use DocumentFormat.OpenXml (v2.7.2) via ClosedXml (v0.9.0) as an implementation of Excel format reader in a .NET console application (targeting .NET 4.6.1 in my case) 2. try to open the file and enumerate Worksheets 3. fail/succeed depending on some minor implementation detail of the application saving the file Actual Results: at MS.Internal.IO.Zip.ZipIOLocalFileBlock.Validate(String fileName, ZipIOCentralDirectoryBlock centralDir, ZipIOCentralDirectoryFileHeader centralDirFileHeader) at MS.Internal.IO.Zip.ZipIOLocalFileBlock.ParseRecord(BinaryReader reader, String fileName, Int64 position, ZipIOCentralDirectoryBlock centralDir, ZipIOCentralDirectoryFileHeader centralDirFileHeader) at MS.Internal.IO.Zip.ZipIOLocalFileBlock.SeekableLoad(ZipIOBlockManager blockManager, String fileName) at MS.Internal.IO.Zip.ZipIOBlockManager.LoadLocalFileBlock(String zipFileName) at MS.Internal.IO.Zip.ZipArchive.GetFile(String zipFileName) at MS.Internal.IO.Zip.ZipArchive.GetFiles() at System.IO.Packaging.ZipPackage.ContentTypeHelper..ctor(ZipArchive zipArchive, IgnoredItemHelper ignoredItemHelper) at System.IO.Packaging.ZipPackage..ctor(Stream s, FileMode mode, FileAccess access, Boolean streaming) at System.IO.Packaging.Package.Open(Stream stream, FileMode packageMode, FileAccess packageAccess, Boolean streaming) at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.OpenCore(Stream stream, Boolean readWriteMode) at DocumentFormat.OpenXml.Packaging.SpreadsheetDocument.Open(Stream stream, Boolean isEditable, OpenSettings openSettings) at ClosedXML.Excel.XLWorkbook.LoadSheets(Stream stream) in C:\projects\closedxml\ClosedXML\Excel\XLWorkbook_Load.cs:line 47 at ClosedXML.Excel.XLWorkbook..ctor(Stream stream) in C:\projects\closedxml\ClosedXML\Excel\XLWorkbook.cs:line 752 at IPMS.LegalExpertise.TpcExcelImport.ExcelTpcParser.OpenWorkbook(Byte[] workbookBytes) in G:\@tfs\...\ETL\Parsers\ITpcFileParser.cs:line 240 at ....ExcelParser.<Get...FromFile>d__17.MoveNext() in G:\@tfs\...\ETL\Parsers\ITpcFileParser.cs:line 126 Expected Results: OpenXml should correctly open Calc files. Reproducible: Sometimes User Profile Reset: No Additional Info: You can of course consider this Minor, but please don't delay if you can give a simple advice right away or very soon. Thanks in advance, I appreciate your time and effort.
Created attachment 155967 [details] problematic file added the problematic file
I'm not sure to understand how to reproduce the pb. However, on Win10 with LO 6.3.3 and with master sources updated today, I could open the file. What do you mean "enumerate Worksheets"? I see only "Sheet1"
Hi Julien, The problem is not opening the file in LO or MS Excel, it's opening via DocumentFormat.OpenXml. Does it work for you as well? DocumentFormat.OpenXml.Packaging.SpreadsheetDocument.Open method consistently failed me, complaining about corrupted ZIP.
Hello Alexey, Could you please try to reproduce it with the latest version of LibreOffice from https://www.libreoffice.org/download/libreoffice-fresh/ ? I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' if the bug is still present in the latest version.
Same behavior with the new version System.IO.FileFormatException: 'File contains corrupted data.' StackTrace " at MS.Internal.IO.Zip.ZipIOLocalFileBlock.Validate(String fileName, ZipIOCentralDirectoryBlock centralDir, ZipIOCentralDirectoryFileHeader centralDirFileHeader)" string
<Application>LibreOffice/6.3.4.2$Linux_X86_64 LibreOffice_project/60da17e045e08f1793c57c00ba83cdfce946d0aa</Application>
Hi Alexey, I don't understand what the problem is. You are trying to open the document with DocumentFormat.OpenXml ? that software was created by Microsoft. What it has to do with LibreOffice ?
LibreOffice creates documents that do not comply to the standard implementation of ZIP. It would be great if someone managed to update the implementation to comply.
see https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
Alexey, do you reproduce this with xlxs files generated with recent LO? Indeed, 6.0.7 is EOL (like 6.1 and 6.2 branches). Here are some test with unzip (LO Debian testing package): unzip -t zip-implementation-bug-128910.xlsx Archive: zip-implementation-bug-128910.xlsx testing: [trash]/0000.dat OK file #2 (xl/sharedStrings.xml): mismatch between local and central GPF bit 11 ("UTF-8"), continuing with central flag (IsUTF8 = 0) testing: xl/sharedStrings.xml OK file #3 (xl/worksheets/_rels/sheet1.xml.rels): mismatch between local and central GPF bit 11 ("UTF-8"), continuing with central flag (IsUTF8 = 0) testing: xl/worksheets/_rels/sheet1.xml.rels OK file #4 (xl/worksheets/sheet1.xml): mismatch between local and central GPF bit 11 ("UTF-8"), continuing with central flag (IsUTF8 = 0) testing: xl/worksheets/sheet1.xml OK file #5 (xl/workbook.xml): mismatch between local and central GPF bit 11 ("UTF-8"), continuing with central flag (IsUTF8 = 0) testing: xl/workbook.xml OK file #6 (xl/styles.xml): mismatch between local and central GPF bit 11 ("UTF-8"), continuing with central flag (IsUTF8 = 0) testing: xl/styles.xml OK testing: [trash]/0002.dat OK file #8 (docProps/app.xml): mismatch between local and central GPF bit 11 ("UTF-8"), continuing with central flag (IsUTF8 = 0) testing: docProps/app.xml OK testing: [trash]/0003.dat OK testing: [trash]/0004.dat OK testing: customXml/item1.xml OK testing: customXml/_rels/item1.xml.rels OK testing: customXml/itemProps2.xml OK testing: [trash]/0001.dat OK testing: customXml/itemProps1.xml OK testing: customXml/_rels/item2.xml.rels OK testing: [trash]/0005.dat OK testing: customXml/item2.xml OK testing: customXml/item3.xml OK testing: customXml/itemProps3.xml OK testing: docProps/custom.xml OK testing: customXml/_rels/item3.xml.rels OK testing: _rels/.rels OK testing: docProps/core.xml OK testing: [Content_Types].xml OK testing: xl/_rels/workbook.xml.rels OK At least one warning-error was detected in zip-implementation-bug-128910.xlsx.
Created attachment 156844 [details] newer problematic file from 6.3.4 comes attached
Thank you for your feedback. I used 'unzip -t' on your last attachment and had the same warnings: file #3 (docProps/app.xml): mismatch between local and central GPF bit 11 ("UTF-8"), continuing with central flag (IsUTF8 = 0) testing: docProps/app.xml OK file #4 (xl/styles.xml): mismatch between local and central GPF bit 11 ("UTF-8"), continuing with central flag (IsUTF8 = 0) testing: xl/styles.xml OK file #5 (xl/workbook.xml): mismatch between local and central GPF bit 11 ("UTF-8"), continuing with central flag (IsUTF8 = 0) testing: xl/workbook.xml OK testing: [trash]/0000.dat OK file #7 (xl/sharedStrings.xml): mismatch between local and central GPF bit 11 ("UTF-8"), continuing with central flag (IsUTF8 = 0) testing: xl/sharedStrings.xml OK file #8 (xl/worksheets/_rels/sheet1.xml.rels): mismatch between local and central GPF bit 11 ("UTF-8"), continuing with central flag (IsUTF8 = 0) testing: xl/worksheets/_rels/sheet1.xml.rels OK file #9 (xl/worksheets/sheet1.xml): mismatch between local and central GPF bit 11 ("UTF-8"), continuing with central flag (IsUTF8 = 0) Let's put this one to NEW.
On pc Debian x86-64 with LO Debian package 6.3.4, here is a test I did: - retrieve the problematic file - add "test" on a cell - resave - rename extension file to zip - unzip -t => no error and no warning.
Thank you for confirming, guys, appreciated. Looking forward to more news.