Bug 35078 - FILEOPEN: Support Excel 2003 XML with Unescaped '<' in attributes values
Summary: FILEOPEN: Support Excel 2003 XML with Unescaped '<' in attributes values
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
3.3.1 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: MSO-XML2003
  Show dependency treegraph
 
Reported: 2011-03-06 23:52 UTC by Domas Jokubauskis
Modified: 2017-11-16 00:00 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
File not recognized in LibreOffice. (84.36 KB, application/vnd.ms-excel)
2011-03-06 23:52 UTC, Domas Jokubauskis
Details
An xls file which has been downloaded, but is imported as LibreOffice Writer doc. (45.38 KB, application/excel)
2011-11-28 16:25 UTC, AndreHasekamp
Details
Working Microsoft Excel 2003 XML file (78.30 KB, text/xml)
2011-11-29 02:55 UTC, Domas Jokubauskis
Details
The initial XML file resaved using MS Office 2003 (64.23 KB, application/vnd.ms-excel)
2012-07-11 10:26 UTC, Domas Jokubauskis
Details
Excel 2003 XML file in UTF-16 encoding (188.40 KB, application/vnd.ms-excel)
2013-12-06 12:22 UTC, Mantas Kriaučiūnas
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Domas Jokubauskis 2011-03-06 23:52:45 UTC
Created attachment 44187 [details]
File not recognized in LibreOffice.

When opening the file "Text Import" dialog shows up. The file is attached. File generator is unknown.


Version information:
LibreOffice 3.3.1 
OOO330m19 (Build:8)
tag libreoffice-3.3.1.2, Debian package 1:3.3.1-1
Comment 1 Don't use this account, use tml@iki.fi 2011-03-07 00:05:31 UTC
For Kohei?
Comment 2 AndreHasekamp 2011-11-28 16:25:28 UTC
Created attachment 53918 [details]
An xls file which has been downloaded, but is imported as LibreOffice Writer doc.
Comment 3 Domas Jokubauskis 2011-11-29 02:41:09 UTC
(In reply to comment #2)
> Created attachment 53918 [details]
> An xls file which has been downloaded, but is imported as LibreOffice Writer
> doc.

Andre, the file you attached is a CSV file renamed to XLS. In your case it is separated by TAB symbols. I tried and it imports just fine when delimiter is selected manually. Bug 38637 describes how CSV import could be improved.

In my case its XML formatted file. More about the format: http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats
http://msdn.microsoft.com/en-us/library/aa140066(office.10).aspx

It seems that the default file extension should be *.XML, however the file was received with *.XLS extension probably to instruct Windows to open the file with MS Excel.

As of LibreOffice 3.4.3, OOO340m1 (Build:302) under Debian Testing the file is treated as CSV file when opening it with *.XLS extension or as text file when *.XML is used.
Comment 4 Domas Jokubauskis 2011-11-29 02:54:28 UTC
I have found a Microsoft Excel 2003 XML file that opens in Calc without problems both with *.XML and *.XLS extentions. The file was found at
http://user.services.openoffice.org/en/forum/viewtopic.php?f=9&t=4067&start=0&st=0&sk=t&sd=a
Comment 5 Domas Jokubauskis 2011-11-29 02:55:44 UTC
Created attachment 53947 [details]
Working Microsoft Excel 2003 XML file
Comment 6 Kohei Yoshida 2011-11-30 16:28:55 UTC
One difference I noticed is that the first variant is encoded in UTF-16 while the second one (that Calc can recognize) is encoded in UTF-8.  I re-saved the first one in UTF-8 but Calc still fails to recognize.  Hmm....

Domas, who generates these files?  Are both generated by the same software (Excel 2003 or otherwise)?
Comment 7 Thomas Käfer 2012-04-16 08:18:48 UTC
I can confirm that this bug still/also exists in the version

LibreOffice 3.3.4 
OOO330m19 (Build:401)
tag libreoffice-3.3.3.1, Ubuntu package 1:3.3.4-0ubuntu1
Comment 8 Kohei Yoshida 2012-07-10 13:27:45 UTC
(In reply to comment #6)

> Domas, who generates these files?  Are both generated by the same software
> (Excel 2003 or otherwise)?

This question remains unanswered.
Comment 9 Domas Jokubauskis 2012-07-11 10:24:54 UTC
Sorry for not answering for so long. The file is a monthly report that I get in e-mail so it is probably automatically generated by backend accounting software not by any office suite.

That said the file structure look fine. It opens with MS Office 2003. I attach the file resaved using MS Office 2003 as XML file and it opens with LibreOffice fine. The difference is the original file is encoded in UTF-16 and resaved file is in UTF-8 with slightly different structure.
Comment 10 Domas Jokubauskis 2012-07-11 10:26:03 UTC
Created attachment 64103 [details]
The initial XML file resaved using MS Office 2003
Comment 11 QA Administrators 2013-05-26 22:31:41 UTC Comment hidden (obsolete)
Comment 12 Joel Madero 2013-05-29 23:56:57 UTC
Hm - to me with Kohei involved and one other confirming, I am marking as NEW

@Kohei - is this our bug?
Comment 13 Mantas Kriaučiūnas 2013-12-06 12:22:03 UTC
Created attachment 90352 [details]
Excel 2003 XML file in UTF-16 encoding

I'm attaching another example of Excel 2003 XML file in UTF-16 encoding:
LibreOffice Calc 4.1.3.2 (and older versions) tries to import this file as CSV, but when I re-encode this file to UTF-8 (simply open with gedit and then "Save As"-> Unicode(UTF-8)) then Calc opens re-encoded file as normal Excel spreadsheet.
MS Excel opens original file in UTF-16 encoding without problems.
Comment 14 Mantas Kriaučiūnas 2013-12-06 12:32:42 UTC
I think it should be not hard to fix LibreOffice to detect Excel 2003 XML files, encoded in UTF-16 Unicode standard, as Calc spreadsheet, because LibreOffice already correctly detects UTF-8 encoded Excel 2003 XML files as Calc spreadsheet
Comment 15 Maxim Monastirsky 2013-12-29 10:04:01 UTC
The original bugdoc has invalid XML:

$ xmllint sales.xls
sales.xls:177: parser error : Unescaped '<' not allowed in attributes values
"summary_money" ss:Formula="=IF(AND(LEFT(R[-5]C,1)>=&quot;0&quot;,LEFT(R[-5]C,1)

It joins several other requests to support invalid XML in Excel 2003 files. See Bug 38361, Bug 68742.