Bug 38969 - [FILEOPEN] Word document in XML format not recognized
Summary: [FILEOPEN] Word document in XML format not recognized
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected)
3.4.1 release
Hardware: Other All
: medium normal
Assignee: Caolán McNamara
Depends on:
Reported: 2011-07-05 03:12 UTC by Aaron Digulla
Modified: 2011-07-08 05:49 UTC (History)
0 users

See Also:
Crash report or crash signature:

Demo file (34.36 KB, application/xml)
2011-07-05 03:51 UTC, Aaron Digulla
screenshot of testMultiPage, renamed as .xml (96.45 KB, image/jpeg)
2011-07-05 14:58 UTC, noname

Note You need to log in before you can comment on or make changes to this bug.
Description Aaron Digulla 2011-07-05 03:12:20 UTC
A Word document in XML format is not recognized.
Comment 1 Caolán McNamara 2011-07-05 03:34:15 UTC
attach an example of what you mean
Comment 2 Aaron Digulla 2011-07-05 03:51:50 UTC
Created attachment 48767 [details]
Demo file

Work DOC file in clipboard/xml format. Not sure what the correct name is. This isn't OOXML, it's something between binary DOC and OOXML (which is a ZIP file).

PS: I did attach this file in the original report but for some reason, it didn't make it into the database.
Comment 3 noname 2011-07-05 14:56:26 UTC
Could you rename your file as .xml instead of .doc and test again. Seems to load correctly over here when renaming.
Added screenshot of output.
Comment 4 noname 2011-07-05 14:58:15 UTC
Created attachment 48789 [details]
screenshot of testMultiPage, renamed as .xml
Comment 5 Aaron Digulla 2011-07-06 02:19:13 UTC
I can use renaming it to .xml as a workaround.

Can you please enhance the loading code to try the ".xml" loader, too, when it sees ".doc"? Maybe just look for "<?mso-application progid="Word.Document"?>" in the first 1024 bytes.

Might not be perfect but should work until someone finds another example where it breaks.
Comment 6 noname 2011-07-07 12:50:39 UTC
I think here the program tries to load a doc file first, when not recognised trying to load as plain text.
What you want (correct me if i'm wrong); Load doc. When not a doc file, try to load as xml. If not xml, try to load as plain text.
Comment 7 Caolán McNamara 2011-07-08 02:44:21 UTC
should just be a matter of telling the Office 2003 xml importer that .doc (and .xls for the excel one) are also acceptable suffixes
Comment 9 Aaron Digulla 2011-07-08 02:59:01 UTC
Please also add xslx and docx (OOXML extensions)
Comment 10 Caolán McNamara 2011-07-08 03:17:02 UTC
There are known uses of .docx as a suffix for the *2003* flat xml file format as opposed to the .zip file based Office Open XML format ?
Comment 11 Aaron Digulla 2011-07-08 05:09:12 UTC
MS Office 2007 (no idea which exact version; can't find the About dialog anymore) can't open such a file.

So let me put it this way: But when a user changes the file extension or when he gets a file with the wrong extension, does he know about this distinction? If you support this to load files, how much damage can it cause?

I think in the worst case, LO will load the file while MS Office won't.
Comment 12 Caolán McNamara 2011-07-08 05:49:21 UTC
These 2003 format xml files often appeared as .doc or .xls to trick word and excel into opening them without extra magic I believe. I'd rather only the minimum necessary bodges into LibreOffice to trick us into opening them as well as such. All things can have unexpected consequences :-)