Created attachment 196421 [details] Exemple file opening certain files with the extension ‘xlsx’ is indicated as corrupted Name : libreoffice-calc Epoch : 1 Version : 24.8.0.3 Release : 1bdk_mga9 Architecture: x86_64 Install Date: ven. 23 août 2024 12:20:31 Group : Office/Spreadsheet Size : 40604549 License : MPL-2.0 and Apache-2.0 and LGPL-3.0-only and LGPL-3.0-or-later and CC0-1.0 and BSD-3-Clause and (LGPL-2.1-only or SISSL) and (MPL-2.0 or LGPL-3.0-or-later) and (MPL-2.0 or LGPL-2.1-or-later) and (MPL-1.1 or GPL-2.0-only or LGPL-2.1-only) Signature : DSA/SHA1, ven. 23 août 2024 03:53:19, Key ID d1e9294d2d9835d8 Source RPM : libreoffice-24.8.0.3-1bdk_mga9.src.rpm Build Date : ven. 23 août 2024 02:43:17 Build Host : GamerRyzen7 Packager : katnatek Vendor : BDK-packagers URL : https://www.libreoffice.org/ Summary : LibreOffice Spreadsheet Application Description : The LibreOffice Spreadsheet application.
Created attachment 196422 [details] screenshot
The same file does not appear corrupted on version 7.6.7.2
Regression introduced by: commit efae4fc42d5fe3c0a69757226f38efc10d101194 [log] author Michael Stahl <michael.stahl@allotropia.de> Tue Jul 16 12:12:09 2024 +0200 committer Michael Stahl <michael.stahl@allotropia.de> Tue Jul 16 15:57:43 2024 +0200 tree 5e7fe7051a76f04b1b8b2ab9c46c271e3f8ff666 parent 2f81046033bb4082f888edfa94685d2dcc2689aa [diff] package: add additional consistency checks for local file header Bisected with: bibisect-linux64-25.2
I tried to open the document with Excel 2016 and it opens it without any complain
(In reply to Xisco Faulí from comment #4) > I tried to open the document with Excel 2016 and it opens it without any > complain Yes, the problem occurs only with libre office
hmm ... apparently this was produced by "Apache POI"? the problem is we detect an 8 byte gap following the data descriptor of every zip entry... it looks like the data descriptor uses 64-bit sizes, but there is no Zip64 extra field on the local header, the extension length is 0... there does not appear to be a Zip64 extra field anywhere in the file, nor is there a Zip64 end of central directory record ... how is one supposed to know these sizes are 64-bit?
the file does look invalid to me, 64-bit data descriptor but no zip64 extra field: 4.3.9.2 When compressing files, compressed and uncompressed sizes SHOULD be stored in ZIP64 format (as 8 byte values) when a file's size exceeds 0xFFFFFFFF. However ZIP64 format MAY be used regardless of the size of a file. When extracting, if the zip64 extended information extra field is present for the file the compressed and uncompressed sizes will be 8 byte values. and in any case, the file is opened by LO in "Repair" mode, so i think that's good enough, resolving NOTABUG for now. (the Repair mode appears to "guess" if it's zip64 based on a following signature) POI would be using Apache Commons-Compress; the code to write the data descriptor is in https://github.com/apache/commons-compress/blob/master/src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveOutputStream.java protected void writeDataDescriptor(final ZipArchiveEntry ze) throws IOException { if (!usesDataDescriptor(ze.getMethod(), false)) { return; } writeCounted(DD_SIG); writeCounted(ZipLong.getBytes(ze.getCrc())); if (!hasZip64Extra(ze)) { writeCounted(ZipLong.getBytes(ze.getCompressedSize())); writeCounted(ZipLong.getBytes(ze.getSize())); } else { writeCounted(ZipEightByteInteger.getBytes(ze.getCompressedSize())); writeCounted(ZipEightByteInteger.getBytes(ze.getSize())); } } contains the obvious check that there is a Zip64 extra field - which the attached file doesn't have. this has been substantially changed since 2011 when Zip64 support was introduced: https://issues.apache.org/jira/browse/COMPRESS-150 really not clear how this file was produced...
Thanks for the detail. Actually, this file is generated by Apache POI I solve this issue using org.apache.poi.xssf.streaming.SXSSFWorkbook#setZip64Mode and setting it to Zip64Mode.Never, to force not compress. Now LO do not complain any more.
*** Bug 163384 has been marked as a duplicate of this bug. ***
I believe I can explain what exactly happens here: 1) Commons Compress Zip64 files are correct but can not be read by Excel 2) As a work around, fir SXSSF a customized Zip64 compressor was adopted, which produces readable files, but with those holes Everything was very fine until this additional check was introduced. So the way forward is likely adopting the Common's Compress `writeDataDescriptor()` method. I will try to work on it over the weekend. Thank a lot for explanation and analysis. The biggest challenge here was to understand at first what causes the problem and which part of the software was to blame. You helped me a lot with establishing this understanding.
reopening based on new info in duplicate bug - it may be possible to use the "version needed to extract" in local file header to distinguish Zip64.
Michael Stahl committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/0f39e6fbb48dae29778c305ddd576d698a8251ad tdf#162944 package: try to detect Zip64 via version It will be available in 25.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
the CI passed with this change, let's hope this is fixed...
Thank you so much! I have built from source and tested with POI's SXSSF files and it works now (again). Recommendation: can you add the provided example to your test suite in order to avoid such regressions in the future? In my opinion, POI plays a large role on server generated XLS/XLSX files and so deserves to be part of the tests. Thank you again, a lot and cheers! Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: 42533c94ec1a52c49b2587e53ab55e67fc4a449a CPU threads: 12; OS: Linux 6.11; UI render: default; VCL: gtk3 Locale: en-US (en_US.UTF-8); UI: en-US Calc: threaded
Michael Stahl committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/d79790da8a4de4758f46ae4a8573382c681af974 tdf#162944 package: add test file It will be available in 25.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.