Bug 79331

Summary: FILEOPEN: Parser error loading MS xml file if filename has no .suffix
Product: LibreOffice Reporter: Jim Avera <jim.avera>
Component: WriterAssignee: Not Assigned <libreoffice-bugs>
Status: RESOLVED NOTABUG    
Severity: normal CC: momonasmon
Priority: medium    
Version: 4.2.2.1 release   
Hardware: Other   
OS: Linux (All)   
Whiteboard: BSA
Crash report or crash signature: Regression By:
Attachments: Copy of document at http://www.idoc.idaho.gov/content/form/498

Description Jim Avera 2014-05-27 18:52:57 UTC
Created attachment 99975 [details]
Copy of document at http://www.idoc.idaho.gov/content/form/498

The attached file is supposedly a Microsoft word xml document (usually stored with .docx suffix in the file name).  However if the file name has no suffix, LibreOffice emits an error message followed by garbage binary data to the terminal:

:1: parser error : Document is empty^M
^C^D^T^M

Steps to reproduce:
1. Run:  libreoffice http://www.idoc.idaho.gov/content/form/498
or
1. Download the attached copy of that same file and store under filename "498" and open that.

Current behavior: Error message with garbage characters appears

Expected behavior: Nothing printed to the terminal

The MIME-type associated with the document at the referenced url is 
vnd.openxmlformats-officedocument.wordprocessingml.document 
which according to various websites means it is a .docx file.

Note: Even after printing the parse-error message and garbage characters, the document appears to open normally (eventually)
Operating System: Ubuntu
Version: 4.2.2.1 release
Comment 1 Maxim Monastirsky 2014-05-27 20:03:28 UTC
It's not a bug. The same will happen with every file that have a wrong extension, or doesn't have extension at all. Libreoffice tries to determine the format of the file, and as part of this it tries to parse it with a xml parser. If you don't want this output, simply give a correct extension to the file. Also keep in mind that this output is from an external library (libxml2), and I doubt if we have any control over it.