Description: LO does not complain, but the files show up as blank Steps to Reproduce: Open attached file Actual Results: Worksheet is blank Expected Results: A1 should be 1, B1 should be 2, etc (compare with Excel 2019) Reproducible: Always User Profile Reset: Yes Additional Info: Originally noticed in a file that was reported in https://github.com/SheetJS/sheetjs/issues/2390 It appears to have been generated by a third party tool
Created attachment 175276 [details] Demonstration of issue
The file is indeed not a proper BIFF8 file as described in [MS-XLS]. The documentation requires (2.1.1 Compound File) that "A file of the type specified by this document MUST be an OLE compound file as specified in [MS-CFB]". The latter documentation requires (2.2 Compound File Header): > The Compound File Header structure MUST be at the beginning of the file (offset 0). > ... > Header Signature (8 bytes): Identification signature for the compound file structure, and MUST be set to the value 0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1. Attachment 175276 [details] starts with 0x09 0x08 0x08 0x00, is it some separate record of BIFF? Is there a specification regarding such partial files somewhere? [MS-XLS] https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/cd03cb5f-ca02-4934-a391-bb674cb8aa06 [MS-CFB] https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-cfb/53989ce4-7b05-4f8d-829b-d08d6148375b
>It appears to have been generated by a third party tool I think it's NOTOURBUG
(In reply to Roman Kuznetsov from comment #3) It's not granted. Excel opens that; and I have tested that: 1. Opening in Excel; 2. Saving as XLS (97); 3. Opening the resulting compound file with 7-zip; 4. Extracting "Workbook" stream; 5. Checking that it resembles the original file in structure (starts with 09 08; ends with 0a 00 00 00); 6. Opening in in Calc results in normal opening. So we obviously support such files for some extent - and need to make sure if the file is indeed broken, or if it's our bug...
On pc Debian x86-64 with master sources updated today, I could reproduce this. I noticed this on console: warn:sc:38679:38679:sc/source/filter/excel/xlroot.cxx:158: XclRootData::XclRootData - cannot get output device info com.sun.star.uno.RuntimeException message: invalid attempt to assign an empty interface of type com.sun.star.frame.XFrame! /home/julien/lo/libreoffice/include/com/sun/star/uno/Reference.hxx:105 Since it's a third party tool + following Mike's comment, should it be NOTOURBUG? Or if Excel can open it, should LO absolutely be able to do it too even if it doesn't respect specs from MS?
argh, I had written my comment before seeing the very last Mike's comment. Forget my question in my comment then.
Code pointer: ScFormatFilterPluginImpl::ScImportExcel starts importing the file. In it, a call to XclImpStream::DetectBiffVersion returns EXC_BIFF5 (because the file does not define a specific BIFF version). Later, in ImportExcel::Bof5, it also reads subtype (bytes 06 and 07), that is 0x1000 in the bugdoc, while the expected value is in the range 0x05 - 0x40. The result state in ImportExcel::Read is Z_BiffNull, and nothing is read. I tend to agree with comment 3, comment 5, and comment in XclImpStream::DetectBiffVersion: > there are some *really* broken documents out there... So I close it as NOTOURBUG. The generator is just extra lazy, creating something that only Excel seems to be able to comprehend (so they rely on "if Excel didn't choke, it's good enough" approach, instead of implementing a standardized output using published standards). If OP has any documentation about ways to read such data that exercises Excel's extra-permissive rules of the non-standardized files, please provide them and reopen the bug. Thanks.
By the way, forgot to add: returning "Biff5" in pExcRoot->eDateiTyp from ImportExcel::Bof5 when there's no match for the subtype, looks to "fix" the issue. However, no idea if that uncovers another can of worms. I have no insight into this format, unfortunately.
OTOH, ImportExcel::Bof{2..4} explicitly do just that: > // #i51490# Excel interprets invalid indexes as worksheet So let me fix it this way for ImportExcel::Bof5.
The spec situation is confusing. The old 97-2007 spec [1] (page 18) refers to a "Simple Save" structure. Our initial interpretation was that the Simple files have no mini FAT. Daniel Rentz and the OpenOffice developers [2] (page 14) may have interpreted that as bare workbook streams: > All document types and BIFF versions can be stored in a simple stream file, most of them are always stored this way. The only exception are BIFF5-BIFF8 workbook documents, which are usually stored as compound document files (see below). If these documents are stored as stream files, the entire file consists of the “Book” stream (BIFF5) or “Workbook” stream (BIFF8) only. [1] https://www.loc.gov/preservation/digital/formats/digformatspecs/Excel97-2007BinaryFileFormat%28xls%29Specification.pdf [2] https://www.openoffice.org/sc/excelfileformat.pdf
Mike Kaganski committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/e753de4630fb74ac837d0339be23a305f18d0725 tdf#144732: treat invalid subtype value as worksheet It will be available in 7.3.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Mike Kaganski committed a patch related to this issue. It has been pushed to "libreoffice-7-2": https://git.libreoffice.org/core/commit/a3010249f185b8d2b7ade2cbf10600255bb935fb tdf#144732: treat invalid subtype value as worksheet It will be available in 7.2.3. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.