Description: Hi, I'm trying to open an Excel XLS spreadsheet containing XML code with LibreOffice version 7.1.5.2 (see specifications below), but get a "General Error. General input/output error." both on Windows 10 and Ubuntu Linux 20.04 (I also tried with LibreOffice 7.0.6, with the same results), while no error is given using MS Excel. The example XLS (see attachment) contains only XML code and is the output of Microsoft Reporting Services, on which I have no control. I also tried to open it using the "Microsoft Excel 2003 XML" file type, as suggested in public forums, but with the same result. Thank you for your support. Steps to Reproduce: Click on file or open file from File menu Actual Results: General Error. General input/output error. Expected Results: File should open as other Excel files Reproducible: Always User Profile Reset: Yes OpenGL enabled: Yes Additional Info: Does not open with LibreOffice: Version: 7.1.5.2 (x64) / LibreOffice Community Build ID: 85f04e9f809797b8199d13c421bd8a2b025d52b5 CPU threads: 2; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win Locale: it-IT (it_IT); UI: it-IT Calc: threaded OpenGL OpenGL vendor string: Intel OpenGL renderer string: Mesa Intel(R) HD Graphics 5500 (BDW GT2) OpenGL core profile version string: 4.6 (Core Profile) Mesa 20.2.6 OpenGL core profile shading language version string: 4.60 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL core profile extensions: OpenGL version string: 4.6 (Compatibility Profile) Mesa 20.2.6 OpenGL shading language version string: 4.60 OpenGL context flags: (none) OpenGL profile mask: compatibility profile OpenGL extensions: OpenGL ES profile version string: OpenGL ES 3.2 Mesa 20.2.6 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20 OpenGL ES profile extensions: Opens in a less recent version of LibreOffice: Version: 5.4.7.2 Build ID: c838ef25c16710f8838b1faec480ebba495259d0 CPU threads: 8; OS: Linux 5.3; UI render: default; VCL: kde4; Locale: en-US (en_US.UTF-8); Calc: group Opens with Apache OpenOffice 4.2.0
Created attachment 173942 [details] Example of excel file with only xml content
The document is broken. It contains <Cell ss:StyleID="s24"><Data ss:Type="String">Identificatore</Data>NamedCell ss:Name="_FilterDatabase"/></Cell> at (1-based) position 2504, which instead probably (unchecked) should be <Cell ss:StyleID="s24"><Data ss:Type="String">Identificatore</Data><NamedCell ss:Name="_FilterDatabase"/></Cell> File a bug at your Microsoft Reporting Services.
Created attachment 173946 [details] Example of excel file with only xml content Thank you... Sorry for teh inconvinience, but even correcting the xml content the file is opened by old version of LibreOffice but not by new version. I have also filed a bug report to the provider of the system which generates the file.
Sorry to reopen the bug report, but even with a new (I checked as formally correct) example I'm unable to open the document with a new installation of LibreOffice (only with the old one)...
That's indeed odd.. thanks for persuading ;)
On pc Debian x86-64 with master sources updated today, I could reproduce this. I noticed this on console: warn:sc:111355:111355:sc/source/filter/orcus/orcusfiltersimpl.cxx:77: Unable to load file via orcus filter! element 'urn:schemas-microsoft-com:office:spreadsheet:Worksheet' expected, but 'urn:schemas-microsoft-com:office:spreadsheet:Workbook' encountered. BTW, I gave a try with LO Debian testing package 7.0.4.2, it doesn't open the file too.
On which LO version did it work?
Ok, the document is still broken.. The <x:WorksheetOptions> element is nested in the <Workbook> element instead of the <Worksheet> element, i.e. there's a closing </Worksheet> before. Again, file a bug with your Microsoft Reporting Services provider. The last "working" ignorant import probably was before we used Orcus for the Excel 2003 XML import. This is still NOTOURBUG but we might keep it open to maybe transport the Orcus exception's error message that is available to the General I/O error dialog.
Fwiw, the exception caught of the read with the message availale is at https://opengrok.libreoffice.org/xref/core/sc/source/filter/orcus/orcusfiltersimpl.cxx?r=75252e58#77
I want to suggest that we still want to have it "fixed" in a sense. The file has some unexpected elements, and that should *never* be a reason to reject files IMO. It's OK to fail on invalid XML, or on invalid values of recognized tokens; but when we see a *valid* XML, e.g. with plain text content in elements which we do not expect plain text (attachment 173942 [details]), or where some "unknown" element appears as a child of a known element (attachment 173946 [details]), those should be silently ignored, as we always do elsewhere (OOXML), implying that those may be some unknown/unimplemented format extensions.
(In reply to Mike Kaganski from comment #10) > where some "unknown" element appears as a child of a known > element (attachment 173946 [details]), those should be silently ignored, as > we always do elsewhere (OOXML), implying that those may be some > unknown/unimplemented format extensions. That's not the case here though. The structure check knows that the <x:WorksheetOptions> element MUST appear nested in the <Worksheet> element. It does not check for unknown elements.
(In reply to Eike Rathke from comment #11) > The structure check knows that the > <x:WorksheetOptions> element MUST appear nested in the <Worksheet> element. > It does not check for unknown elements. IMO it looks somewhat backwards. In context of XML parsing, any element that appears where we expect it is a known element, and when it happens to appear in other places, it should not be considered as "known but invalid", but it should be considered unknown. When we find an unexpected element <Foo> under <Bar>, why should we care that we know that there may be a <Foo> under <Baz>? E.g., there are same-name elements like w:sectPr both in w:body ("Document Final Section Properties", ECMA-376 Part 1 17.6.17) and in w:pPr ("Section Properties", 17.6.18). If a subsequent review of the standard adds same-name element elsewhere, we must not fail (but of course we should ignore it there).
I can see Mike's argument. But changing this behavior would not be a quick overnight step. I'll look into that here: https://gitlab.com/orcus/orcus/-/issues/138
The orcus library version 0.17.2, which just hit the master branch, should address this.