Bug 118227 - XML 2003 Excel import export filter issues
Summary: XML 2003 Excel import export filter issues
Status: REOPENED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
6.0.4.2 release
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: MSO-XML2003
  Show dependency treegraph
 
Reported: 2018-06-18 14:53 UTC by DM
Modified: 2023-09-29 01:25 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Test compare MSO LO62 LO64+ (79.30 KB, image/png)
2019-09-02 08:36 UTC, Timur
Details
Test file from comment 1 (1.25 KB, application/xml)
2019-09-03 11:29 UTC, Kohei Yoshida
Details

Note You need to log in before you can comment on or make changes to this bug.
Description DM 2018-06-18 14:53:56 UTC
There are a number of Excel XML 2003 errors in both reading and writing.
This format is very useful to have functioning as it is extremely easy for a script to generate, takes up little space, and is openable by both Excel and LibreOffice. In referring here to XML files, XML 2003 is meant.

The code below illustrates a number of problems:-
* COLORS - Basic named colours are not recognised (here 'red') but hex (here #00ff00) is. Excel recognises them both.
* DATES - 1999-04-01 is not recognised as a date, but 1999-04-02T00:00:00.000 is. Excel recognises them both. This shorter form of date is very helpful in terms of file size.
* COLUMN ASSIGNMENT - If you remove ss:ExpandedColumnCount="3" and also <Column ss:StyleID="s2"/>  then LibreOffice will not correctly apply the third column even though it has index="3". Excel does so correctly.
* AUTOFIT - The reader is supposed to autofit the AutoFitWidth columns (numbers and dates) but LibreOffice fails to do so. Excel does.
* DATE FORMATS - With <NumberFormat ss:Format="yyyy/mm\-dd;@"/> LibreOffice loads the two separator symbols as / and - respectively which I think is technically sensible, however my Excel loads them differently (the / comes out as -) and this may need to be borne in mind or noted for compatibility reasons.
* LOCKS - Open an XML file in LibreOffice and then close the file (but not LibreOffice) - LibreOffice retains a lock on it and you have to close the entire application to close the lock. It ought to release the lock on closure, or perhaps earlier on finishing reading the file. If the file is read-only, I would suggest releasing the lock when the reading is complete. This should be very easy to fix and would be useful as testing XML output involves closing and opening numerous times and closing the application each time is a pain!
Some further issues -
* CELL BLOAT - Create a one-cell spreadsheet in LibreOffice, save it and read it back. Reading in takes a long time because the export has written a vast number of empty cells/rows, creating a large file. Removing those empty cells, the result is small and LibreOffice reads it in swiftly. Excel does not produce these empty cells when the same task is carried out.
* SAVING - Create a new Calc spreadsheet, put a date in cell A1, format it YYYY-MM-DD from the date format list, then save as M.S. Excel XML 2003, close and reopen that file. The resulting file reads in the date, but not the format.
Cheers,
David

SAMPLE FILE (utf8 file no utf8 BOF marker) -

<?xml version="1.0"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o = "urn:schemas-microsoft-com:office:office" xmlns:x = "urn:schemas-microsoft-com:office:excel" xmlns:ss = "urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">
<Styles>
<Style ss:ID="s1"><Font ss:Bold="1" ss:Color="#008000"/><NumberFormat ss:Format="0.00_ ;[Red]\-0.00\ "/></Style>
<Style ss:ID="s2"><Font ss:Color="red"/></Style>
<Style ss:ID="s3"><Font ss:Color="blue"/><NumberFormat ss:Format="yyyy/mm\-dd;@"/></Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table ss:TopCell="2" ss:LeftCell="2" ss:DefaultColumnWidth="20" ss:ExpandedColumnCount="3">
<Column ss:StyleID="s1" ss:AutoFitWidth="1"/>
<Column ss:StyleID="s2"/>
<Column ss:StyleID="s3" ss:Index="3" ss:AutoFitWidth="1"/>
<Row ss:StyleID="s1">
<Cell ss:Index="2"><Data ss:Type="String">Title</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="Number">111</Data></Cell>
<Cell><Data ss:Type="String">aaaa</Data></Cell>
<Cell><Data ss:Type="DateTime">1999-04-01</Data></Cell>
</Row>
<Row>
<Cell><Data ss:Type="Number">2222</Data></Cell>
<Cell><Data ss:Type="String">bbbb</Data></Cell>
<Cell><Data ss:Type="DateTime">1999-04-02T00:00:00.000</Data></Cell>
</Row>
</Table>
</Worksheet>
</Workbook>
Comment 1 V Stuart Foote 2018-06-18 18:17:53 UTC
IIUC this is via the external Orcus project import filters... and rather than save-as we Export back to Excel 2003 XML.

@Kohei, for you?
Comment 2 DM 2018-06-20 20:04:47 UTC
Two additional bugs.
In the code at the bottom of this comment, if you open it (extension .xml) in LibreOffice it reads correctly, but now hit Save, close and reopen it, and you'll see the formatting has gone!
There are in fact two issues with the XML it so saved - firstly it hasn't written the styles so the number formatting is lost (two decimals seems to be ok but not 3), and secondly it has skipped writing the formatted blank cell by using an index on the following one to assert its position, and this means the blank cell opens without formatting, which is incorrect, as it is very common to want formatted blank cells - the correct way to write it would be as Excel does as a Cell without the inner Data tag, i.e.
<Cell ss:StyleID="stylename"/>
Cheers, David

<?xml version="1.0"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o = "urn:schemas-microsoft-com:office:office" xmlns:x = "urn:schemas-microsoft-com:office:excel" xmlns:ss = "urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">
<Styles>
<Style ss:ID="s"><Alignment ss:Horizontal="Right"/><NumberFormat ss:Format="0.000;[Red]\-0.000;0"/></Style>
</Styles>
<Worksheet ss:Name="Sht">
<Table>
<Row>
<Cell ss:StyleID="s"><Data ss:Type="Number">-1</Data></Cell>
<Cell ss:StyleID="s"/>
<Cell ss:StyleID="s"><Data ss:Type="Number">-1</Data></Cell>
</Row>
</Table>
</Worksheet>
</Workbook>
Comment 3 Buovjaga 2018-06-24 19:16:38 UTC
(In reply to DM from comment #2)
> Two additional bugs.

...belong to their own reports. One issue per report.
Comment 4 Lucky Anderson 2018-10-27 22:31:09 UTC Comment hidden (spam)
Comment 5 Timur 2018-11-01 17:20:08 UTC
This was filed for LO 6.0. 
There was a change in 6.1: https://wiki.documentfoundation.org/ReleaseNotes/6.1#Improvements_to_Excel_2003_XML_import 
Should've been first searched and for missing divided to separate reports. 
But I see nothing now with 6.2+.
I'll put to New. 
Existing related bugs can go to See Also.
Comment 6 Kohei Yoshida 2019-08-15 12:57:11 UTC
The fix for this just landed on the master branch with the upgrading of orcus.
Comment 7 Timur 2019-09-02 08:36:23 UTC
Created attachment 153814 [details]
Test compare MSO LO62 LO64+

From what I see, LO now shows data, but formatting is still wrong. I'll set back to Reopened.
Comment 8 Kohei Yoshida 2019-09-03 11:29:04 UTC
Created attachment 153834 [details]
Test file from comment 1

Let me just attach the file so that it's easier to find.
Comment 9 Kohei Yoshida 2023-09-29 01:25:45 UTC
The latest version of orcus should improve this quite a bit.