Bug 162944 - opening certain files with the extension ‘xlsx’ is indicated as corrupted
Summary: opening certain files with the extension ‘xlsx’ is indicated as corrupted
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
24.8.1.2 release
Hardware: All Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, regression
Depends on:
Blocks: XLSX
  Show dependency treegraph
 
Reported: 2024-09-13 08:41 UTC by saveurlinux
Modified: 2024-09-23 11:26 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Exemple file (18.61 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2024-09-13 08:41 UTC, saveurlinux
Details
screenshot (27.96 KB, image/png)
2024-09-13 08:42 UTC, saveurlinux
Details

Note You need to log in before you can comment on or make changes to this bug.
Description saveurlinux 2024-09-13 08:41:39 UTC
Created attachment 196421 [details]
Exemple file

opening certain files with the extension ‘xlsx’ is indicated as corrupted

Name        : libreoffice-calc
Epoch       : 1
Version     : 24.8.0.3
Release     : 1bdk_mga9
Architecture: x86_64
Install Date: ven. 23 août 2024 12:20:31
Group       : Office/Spreadsheet
Size        : 40604549
License     : MPL-2.0 and Apache-2.0 and LGPL-3.0-only and LGPL-3.0-or-later and CC0-1.0 and BSD-3-Clause and (LGPL-2.1-only or SISSL) and (MPL-2.0 or LGPL-3.0-or-later) and (MPL-2.0 or LGPL-2.1-or-later) and (MPL-1.1 or GPL-2.0-only or LGPL-2.1-only)
Signature   : DSA/SHA1, ven. 23 août 2024 03:53:19, Key ID d1e9294d2d9835d8
Source RPM  : libreoffice-24.8.0.3-1bdk_mga9.src.rpm
Build Date  : ven. 23 août 2024 02:43:17
Build Host  : GamerRyzen7
Packager    : katnatek
Vendor      : BDK-packagers
URL         : https://www.libreoffice.org/
Summary     : LibreOffice Spreadsheet Application
Description :
The LibreOffice Spreadsheet application.
Comment 1 saveurlinux 2024-09-13 08:42:05 UTC
Created attachment 196422 [details]
screenshot
Comment 2 saveurlinux 2024-09-13 08:46:19 UTC
The same file does not appear corrupted on version 7.6.7.2
Comment 3 Xisco Faulí 2024-09-13 09:20:10 UTC
Regression introduced by:

commit efae4fc42d5fe3c0a69757226f38efc10d101194	[log]
author	Michael Stahl <michael.stahl@allotropia.de>	Tue Jul 16 12:12:09 2024 +0200
committer	Michael Stahl <michael.stahl@allotropia.de>	Tue Jul 16 15:57:43 2024 +0200
tree 5e7fe7051a76f04b1b8b2ab9c46c271e3f8ff666
parent 2f81046033bb4082f888edfa94685d2dcc2689aa [diff]

package: add additional consistency checks for local file header

Bisected with: bibisect-linux64-25.2
Comment 4 Xisco Faulí 2024-09-13 09:20:32 UTC
I tried to open the document with Excel 2016 and it opens it without any complain
Comment 5 saveurlinux 2024-09-13 09:21:48 UTC
(In reply to Xisco Faulí from comment #4)
> I tried to open the document with Excel 2016 and it opens it without any
> complain

Yes, the problem occurs only with libre office
Comment 6 Michael Stahl (allotropia) 2024-09-16 19:23:37 UTC
hmm ... apparently this was produced by "Apache POI"?

the problem is we detect an 8 byte gap following the data descriptor of every zip entry...

it looks like the data descriptor uses 64-bit sizes, but there is no Zip64 extra field on the local header, the extension length is 0...

there does not appear to be a Zip64 extra field anywhere in the file, nor is there a Zip64 end of central directory record ... how is one supposed to know these sizes are 64-bit?
Comment 7 Michael Stahl (allotropia) 2024-09-17 11:07:34 UTC
the file does look invalid to me, 64-bit data descriptor but no zip64 extra field:

      4.3.9.2 When compressing files, compressed and uncompressed sizes 
      SHOULD be stored in ZIP64 format (as 8 byte values) when a 
      file's size exceeds 0xFFFFFFFF.   However ZIP64 format MAY be 
      used regardless of the size of a file.  When extracting, if 
      the zip64 extended information extra field is present for 
      the file the compressed and uncompressed sizes will be 8
      byte values.  

and in any case, the file is opened by LO in "Repair" mode, so i think that's good enough, resolving NOTABUG for now.

(the Repair mode appears to "guess" if it's zip64 based on a following signature)

POI would be using Apache Commons-Compress; the code to write the data descriptor is
in https://github.com/apache/commons-compress/blob/master/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveOutputStream.java

    protected void writeDataDescriptor(final ZipArchiveEntry ze) throws IOException {
        if (!usesDataDescriptor(ze.getMethod(), false)) {
            return;
        }
        writeCounted(DD_SIG);
        writeCounted(ZipLong.getBytes(ze.getCrc()));
        if (!hasZip64Extra(ze)) {
            writeCounted(ZipLong.getBytes(ze.getCompressedSize()));
            writeCounted(ZipLong.getBytes(ze.getSize()));
        } else {
            writeCounted(ZipEightByteInteger.getBytes(ze.getCompressedSize()));
            writeCounted(ZipEightByteInteger.getBytes(ze.getSize()));
        }
    }

contains the obvious check that there is a Zip64 extra field - which the attached file doesn't have.

this has been substantially changed since 2011 when Zip64 support was introduced:

https://issues.apache.org/jira/browse/COMPRESS-150

really not clear how this file was produced...
Comment 8 saveurlinux 2024-09-23 11:26:53 UTC
Thanks for the detail.
Actually, this file is generated by Apache POI
I solve this issue using
org.apache.poi.xssf.streaming.SXSSFWorkbook#setZip64Mode 
and setting it to Zip64Mode.Never, to force not compress.
Now LO do not complain any more.