Bug 143600 - Opening Excel XLS spreadsheet which contains XML fails with "General Error. General input/output error."
Summary: Opening Excel XLS spreadsheet which contains XML fails with "General Error. G...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.0.4.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Kohei Yoshida
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: orcus_bugs
  Show dependency treegraph
 
Reported: 2021-07-29 10:47 UTC by Byte Wanderer
Modified: 2021-12-16 03:52 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Example of excel file with only xml content (5.33 KB, application/vnd.ms-excel)
2021-07-29 10:50 UTC, Byte Wanderer
Details
Example of excel file with only xml content (5.30 KB, application/vnd.ms-excel)
2021-07-29 12:28 UTC, Byte Wanderer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Byte Wanderer 2021-07-29 10:47:46 UTC
Description:
Hi, I'm trying to open an Excel XLS spreadsheet containing XML code with LibreOffice version 7.1.5.2 (see specifications below), but get a "General Error. General input/output error." both on Windows 10 and Ubuntu Linux 20.04 (I also tried with LibreOffice 7.0.6, with the same results), while no error is given using MS Excel.

The example XLS (see attachment) contains only XML code and is the output of Microsoft Reporting Services, on which I have no control.

I also tried to open it using the "Microsoft Excel 2003 XML" file type, as suggested in public forums, but with the same result.

Thank you for your support.

Steps to Reproduce:
Click on file or open file from File menu


Actual Results:
General Error. General input/output error.

Expected Results:
File should open as other Excel files


Reproducible: Always


User Profile Reset: Yes


OpenGL enabled: Yes

Additional Info:
Does not open with LibreOffice:
Version: 7.1.5.2 (x64) / LibreOffice Community
Build ID: 85f04e9f809797b8199d13c421bd8a2b025d52b5
CPU threads: 2; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win
Locale: it-IT (it_IT); UI: it-IT
Calc: threaded

OpenGL
OpenGL vendor string: Intel
OpenGL renderer string: Mesa Intel(R) HD Graphics 5500 (BDW GT2)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 20.2.6
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.6 (Compatibility Profile) Mesa 20.2.6
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 20.2.6
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:

Opens in a less recent version of LibreOffice:
Version: 5.4.7.2
Build ID: c838ef25c16710f8838b1faec480ebba495259d0
CPU threads: 8; OS: Linux 5.3; UI render: default; VCL: kde4; 
Locale: en-US (en_US.UTF-8); Calc: group

Opens with Apache OpenOffice 4.2.0
Comment 1 Byte Wanderer 2021-07-29 10:50:24 UTC
Created attachment 173942 [details]
Example of excel file with only xml content
Comment 2 Eike Rathke 2021-07-29 11:12:54 UTC
The document is broken.
It contains

  <Cell ss:StyleID="s24"><Data ss:Type="String">Identificatore</Data>NamedCell ss:Name="_FilterDatabase"/&gt;</Cell>

at (1-based) position 2504, which instead probably (unchecked) should be

  <Cell ss:StyleID="s24"><Data ss:Type="String">Identificatore</Data><NamedCell ss:Name="_FilterDatabase"/></Cell>

File a bug at your Microsoft Reporting Services.
Comment 3 Byte Wanderer 2021-07-29 12:28:53 UTC
Created attachment 173946 [details]
Example of excel file with only xml content

Thank you... Sorry for teh inconvinience, but even correcting the xml content the file is opened by old version of LibreOffice but not by new version.
I have also filed a bug report to the provider of the system which generates the file.
Comment 4 Byte Wanderer 2021-07-29 12:30:57 UTC
Sorry to reopen the bug report, but even with a new (I checked as formally correct) example I'm unable to open the document with a new installation of LibreOffice (only with the old one)...
Comment 5 Eike Rathke 2021-07-29 17:54:39 UTC
That's indeed odd.. thanks for persuading ;)
Comment 6 Julien Nabet 2021-07-29 18:19:04 UTC
On pc Debian x86-64 with master sources updated today, I could reproduce this.

I noticed this on console:
warn:sc:111355:111355:sc/source/filter/orcus/orcusfiltersimpl.cxx:77: Unable to load file via orcus filter! element 'urn:schemas-microsoft-com:office:spreadsheet:Worksheet' expected, but 'urn:schemas-microsoft-com:office:spreadsheet:Workbook' encountered.


BTW, I gave a try with LO Debian testing package 7.0.4.2, it doesn't open the file too.
Comment 7 Julien Nabet 2021-07-29 18:31:44 UTC
On which LO version did it work?
Comment 8 Eike Rathke 2021-07-29 19:30:45 UTC
Ok, the document is still broken..

The <x:WorksheetOptions> element is nested in the <Workbook> element instead of the <Worksheet> element, i.e. there's a closing </Worksheet> before.

Again, file a bug with your Microsoft Reporting Services provider.

The last "working" ignorant import probably was before we used Orcus for the  Excel 2003 XML import.

This is still NOTOURBUG but we might keep it open to maybe transport the Orcus exception's error message that is available to the General I/O error dialog.
Comment 9 Eike Rathke 2021-07-29 19:37:10 UTC
Fwiw, the exception caught of the read with the message availale is at
https://opengrok.libreoffice.org/xref/core/sc/source/filter/orcus/orcusfiltersimpl.cxx?r=75252e58#77
Comment 10 Mike Kaganski 2021-07-30 08:40:50 UTC
I want to suggest that we still want to have it "fixed" in a sense. The file has some unexpected elements, and that should *never* be a reason to reject files IMO. It's OK to fail on invalid XML, or on invalid values of recognized tokens; but when we see a *valid* XML, e.g. with plain text content in elements which we do not expect plain text (attachment 173942 [details]), or where some "unknown" element appears as a child of a known element (attachment 173946 [details]), those should be silently ignored, as we always do elsewhere (OOXML), implying that those may be some unknown/unimplemented format extensions.
Comment 11 Eike Rathke 2021-07-30 11:51:05 UTC
(In reply to Mike Kaganski from comment #10)
> where some "unknown" element appears as a child of a known
> element (attachment 173946 [details]), those should be silently ignored, as
> we always do elsewhere (OOXML), implying that those may be some
> unknown/unimplemented format extensions.
That's not the case here though. The structure check knows that the <x:WorksheetOptions> element MUST appear nested in the <Worksheet> element. It does not check for unknown elements.
Comment 12 Mike Kaganski 2021-07-30 12:30:37 UTC
(In reply to Eike Rathke from comment #11)
> The structure check knows that the
> <x:WorksheetOptions> element MUST appear nested in the <Worksheet> element.
> It does not check for unknown elements.

IMO it looks somewhat backwards. In context of XML parsing, any element that appears where we expect it is a known element, and when it happens to appear in other places, it should not be considered as "known but invalid", but it should be considered unknown. When we find an unexpected element <Foo> under <Bar>, why should we care that we know that there may be a <Foo> under <Baz>? E.g., there are same-name elements like w:sectPr both in w:body ("Document Final Section Properties", ECMA-376 Part 1 17.6.17) and in w:pPr ("Section Properties", 17.6.18). If a subsequent review of the standard adds same-name element elsewhere, we must not fail (but of course we should ignore it there).
Comment 13 Kohei Yoshida 2021-07-31 00:01:12 UTC
I can see Mike's argument. But changing this behavior would not be a quick overnight step.  I'll look into that here: https://gitlab.com/orcus/orcus/-/issues/138
Comment 14 Kohei Yoshida 2021-12-16 03:52:42 UTC
The orcus library version 0.17.2, which just hit the master branch, should address this.