Bug 137015 - FILEOPEN MS Excel 2003 XML fails when having double UTF-8 BOM
Summary: FILEOPEN MS Excel 2003 XML fails when having double UTF-8 BOM
Status: RESOLVED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.1.5.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-25 07:32 UTC by plasticassius
Modified: 2021-07-26 10:04 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description plasticassius 2020-09-25 07:32:51 UTC
An attempt to open an MS Excel 2003 XML format file that includes the xml version tag <?xml version="1.0"?> on the first line fails, and an attempt to import the file as text is made. Alternatively,

$soffice --headless --infilter="MS Excel 2003 XML" --convert-to ods file.xls
Entity: line 1: parser error : Start tag expected, '<' not found
<?xml version="1.0"?>
   ^
Error: source file could not be loaded

When the first line is deleted from the file, it opens correctly.

An example of this kind of file can be dowloaded from

https://www.ishares.com/us/products/239737/?referrer=tickerSearch

by clicking on the Download link near the top on the right side of the page.
Comment 1 Maxim Monastirsky 2020-09-25 09:52:24 UTC
(In reply to plasticassius from comment #0)
> An attempt to open an MS Excel 2003 XML format file that includes the xml
> version tag <?xml version="1.0"?> on the first line fails, and an attempt to
> import the file as text is made.
The problem isn't with the version tag, but with the file having the UTF-8 BOM twice. It just happens that when you attempt to edit the file to remove the version tag, the redundant BOM is also removed.

> $soffice --headless --infilter="MS Excel 2003 XML" --convert-to ods file.xls
Here you're attempting to use the obsolete "MS Excel 2003 XML" filter, instead of the new and much improved "MS Excel 2003 XML Orcus" which is the one used in the UI by default. In most cases there is no need to explicitly specify the input filter, as LO can figure out the filter itself. In addition, there is no need to specify --headless when you already have --convert-to.
Comment 2 plasticassius 2020-09-25 21:28:17 UTC
(In reply to Maxim Monastirsky from comment #1)
> (In reply to plasticassius from comment #0)

Good catch about the BOM, I hadn't looked at that at all. Apparently I had removed both BOMs along with the version tag, which made it import correctly. I can also confirm that removing one BOM or both BOMs, but not the version tag also makes it import correctly.

> > $soffice --headless --infilter="MS Excel 2003 XML" --convert-to ods file.xls
The only reason I showed the output of this line is that it helped me debug what was going on. I don't know much about how files get opened, and I wasn't sure how to get a more specific error message about why the open was failing.
Comment 3 Buovjaga 2021-07-26 10:04:18 UTC
Looks like this can be closed