Bug 82984 - FILEOPEN: Error opening XLSX file in zip64 format
Summary: FILEOPEN: Error opening XLSX file in zip64 format
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: Other All
: medium major
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords: filter:xlsx
: 98836 143958 (view as bug list)
Depends on:
Blocks: XLSX File-Opening XLSX-External-Generators
  Show dependency treegraph
 
Reported: 2014-08-23 08:36 UTC by William Mann
Modified: 2022-09-09 19:42 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
File in zip64 format which causes load error. (4.60 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2014-08-23 08:36 UTC, William Mann
Details
Modified (rezipped) file that opens correctly. (9.84 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2014-08-23 08:38 UTC, William Mann
Details
XLSX without ZIP64 (12.34 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2018-11-30 06:22 UTC, Andreas Reichel
Details
XLSX with ZIP64 (12.88 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2018-11-30 06:24 UTC, Andreas Reichel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description William Mann 2014-08-23 08:36:35 UTC
Created attachment 105137 [details]
File in zip64 format which causes load error.

Problem description: 

When attempting to open an XLSX file compressed using the ZIP64 format LO gives an error stating the file is damaged. After unzipping the file and then re-compressing LO is then able to open the file. The ZIP version in the header of the original file is 0x2D while the files produced by LO contain 0x14 as the version.

Steps to reproduce:
1. Attempt to load an XLSX file in ZIP64 format.

Current behavior:

Error stating that file is damaged.

Expected behavior:

Loading the file.

              
Operating System: Linux (Other)
Version: 4.3.0.4 release
Comment 1 William Mann 2014-08-23 08:38:02 UTC
Created attachment 105138 [details]
Modified (rezipped) file that opens correctly.
Comment 2 Owen Genat (retired) 2014-08-23 13:49:24 UTC
(In reply to comment #0)
> Current behavior:
> Error stating that file is damaged.

Confirmed under GNU/Linux using:

- v4.3.0.4 Build ID: 62ad5818884a2fc2e5780dd45466868d41009ec0
- v4.4.0.0.alpha0+ Build ID: e379401618268ed7f7f5885a36b90e1f4f6cd4af TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-08-18_05:51:03

Status set to NEW.
Comment 3 Aigars Mahinovs 2015-02-09 15:17:50 UTC
Confirmed that this also affects files created with https://xlsxwriter.readthedocs.org/ if use_zip64 option is used (which is mandatory for large files).
Comment 4 William Mann 2015-08-14 09:49:13 UTC Comment hidden (obsolete)
Comment 5 QA Administrators 2016-09-20 10:25:40 UTC Comment hidden (obsolete)
Comment 6 William Mann 2016-09-20 13:18:43 UTC
As requested, I've confirmed that this bug still exists and it's behaviour is the same as reported originally. I verified using LO 5.2.1.2 running on a Kubuntu 16.04  64-bit system (linux kernel version is 4.4.0-38-generic).
Comment 7 Xisco Faulí 2017-09-29 08:48:07 UTC Comment hidden (obsolete)
Comment 8 Andreas Reichel 2018-11-30 06:22:16 UTC
Created attachment 147159 [details]
XLSX without ZIP64

This file can be read in LibreOffice and Gnumeric.
Comment 9 Andreas Reichel 2018-11-30 06:24:05 UTC
Created attachment 147160 [details]
XLSX with ZIP64

This file can be read in Gnumeric, but fails in LibreOffice.
Comment 10 Andreas Reichel 2018-11-30 06:26:53 UTC
This bug is still valid in LibreOffice Version: 6.1.3.2 (CPU threads: 16; OS: Linux 4.19; UI render: default; VCL: gtk3; Locale: en-US (en_US.UTF-8); Calc: threaded).

I have created 2 files with the same content using Apache POI. One file is ZIP64 and can be read by Gnumeric, but not LibreOffice. The other file has been written without ZIP64 and can be read by LibreOffice 6.1.3.2
Comment 11 Krzysztof Rzymkowski 2019-02-23 17:37:25 UTC
Looks like LibreOffice (Version: 6.0.7.3, Build ID: 1:6.0.7-0ubuntu0.18.04.2) has a hard requirement on just the zip version field. But only in Central directory's "version needed to extract" (see: https://en.wikipedia.org/wiki/Zip_(file_format)#Central_directory_file_header). Looks like this version needs to less of equal to 30. Other version fields can be 45: Local file header's version and "version made by" in central directory. 

For a Excel and LibreOffice compatible zip64  compressor implementation see: https://github.com/rzymek/opczip/blob/master/src/main/java/com/github/rzymek/opczip/Zip64Impl.java
Comment 12 Roman Kuznetsov 2021-04-12 14:43:04 UTC
still repro in

Version: 7.2.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 7a0e0a84a02f505200331c19b28d45e898cd5a12
CPU threads: 4; OS: Windows 10.0 Build 18363; UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: threaded Jumbo
Comment 13 NISZ LibreOffice Team 2021-07-14 08:29:01 UTC
*** Bug 98836 has been marked as a duplicate of this bug. ***
Comment 14 dongshili 2021-08-20 14:46:00 UTC Comment hidden (no-value)
Comment 15 Kevin Suo 2021-11-04 05:41:04 UTC
*** Bug 143958 has been marked as a duplicate of this bug. ***
Comment 16 Kevin Suo 2021-11-04 05:46:53 UTC
File bugs which should be marked as a duplicate of this bug, in debugging the exception should be in:
https://opengrok.libreoffice.org/xref/core/package/source/zipapi/ZipFile.cxx?r=d0a8d4a9#946
Comment 18 Kevin Suo 2021-11-04 07:41:23 UTC
The easiest way to identify whether an xlsx (zip) for is in ZIP64 format, under linux, seems to be:

$ xxd ./test.xlsx
00000000: 504b 0304 2d00 0000 0800 4155 6e48 3c5c  PK..-.....AUnH<\
00000010: b548 ffff ffff ffff ffff 1300 1400 5b43  .H............[C
......

Each column is 2 bytes

According to https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html:

1. The first 4 bytes, "504b 0304" indicates that it is a zip file.
2. Bytes 19-22 denotes to "Compressed size". If archive is in ZIP64 format, then this is "ffff ffff".
3. Bytes 23-26 denotes to "Uncompressed size". If archive is in ZIP 64 format, then this is also "ffff ffff".
Comment 19 Ming Hua 2021-11-04 08:40:45 UTC
(In reply to Kevin Suo from comment #18)
> According to https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html:
> 
> 1. The first 4 bytes, "504b 0304" indicates that it is a zip file.
> 2. Bytes 19-22 denotes to "Compressed size". If archive is in ZIP64 format,
> then this is "ffff ffff".
> 3. Bytes 23-26 denotes to "Uncompressed size". If archive is in ZIP 64
> format, then this is also "ffff ffff".
I don't think reading zip file headers is necessary, as it seems to be already read in
https://opengrok.libreoffice.org/xref/core/package/source/zipapi/ZipFile.cxx?r=d0a8d4a9#938
where nCompressedSize and nSize should correspond to the "Compressed size" and "Uncompressed size" above.

(In reply to Kevin Suo from comment #16)
> File bugs which should be marked as a duplicate of this bug, in debugging
> the exception should be in:
> https://opengrok.libreoffice.org/xref/core/package/source/zipapi/ZipFile.
> cxx?r=d0a8d4a9#946
And here is tests if nCompressedSize or nSize is "0xffffffff" to see if Zip64 is needed, throwing exception if yes.
Comment 20 Kevin Suo 2021-11-04 09:16:52 UTC
(In reply to Ming Hua from comment #19)
I pointed this out to help QA to determine whether a certain such bug is due to the using of backslash as file name separator (bug 76115), the use of ZIP64 (this bug), or other reasons, and mark as duplicate to the correct bug accordingly.

Yes you are right, currently if the file is zipped using ZIP64, then it throws an exemption in https://opengrok.libreoffice.org/xref/core/package/source/zipapi/ZipFile.cxx?r=d0a8d4a9#950. That is a FIXME and should be implemented.
Comment 21 Ming Hua 2021-11-04 09:52:01 UTC
(In reply to Kevin Suo from comment #20)
> I pointed this out to help QA to determine whether a certain such bug is due
> to the using of backslash as file name separator (bug 76115), the use of
> ZIP64 (this bug), or other reasons, and mark as duplicate to the correct bug
> accordingly.
For QA purpose I think there are easier ways to identify files using zip64 format.

On Linux zipinfo has "-v" option which outputs a line like
A subfield with ID 0x0001 (PKWARE 64-bit sizes) and 16 data bytes:
d3 02 00 00 00 00 00 00 ef 00 00 00 00 00 00 00.
for each file that is compressed with zip64 format.

On Windows, 7-zip UI for archive content has a column indicating the file is using zip64 format.
Comment 22 Kevin Suo 2021-12-12 16:19:12 UTC
From a person migrating from MSO to LibreOffice, I understand that this (7-years old bug) is a key issue stoping them because the ERP software they used, Kingdee, uses zip64 format when generating xlsx files, and they simply can not open any xlsx files exported from their ERP using LibreOffice. A workaround is to unzip their xlsx, and then zip again.

The following code comment:
// FIXME64: need to read the 64bit header instead

In https://opengrok.libreoffice.org/xref/core/package/source/zipapi/ZipFile.cxx?r=d0a8d4a9#946 indicates that this is a core feature not yet implemented.