Bug 106388 - FILEOPEN: Flat MS Office Word 2003 XML file opens with wrong import filter
Summary: FILEOPEN: Flat MS Office Word 2003 XML file opens with wrong import filter
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: File-Opening
  Show dependency treegraph
 
Reported: 2017-03-07 15:10 UTC by sam tygier
Modified: 2023-10-23 12:12 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
test_file_word2010.xml (46.57 KB, text/xml)
2017-03-07 15:10 UTC, sam tygier
Details

Note You need to log in before you can comment on or make changes to this bug.
Description sam tygier 2017-03-07 15:10:08 UTC
Created attachment 131701 [details]
test_file_word2010.xml

I was send a form to fill in a .doc that actually contained xml (not contained in a zip). Libreoffice 5.2 and 5.3 open this showing the raw xml. The file opens normally in MS word 2010. Changing the filename extension to docx or xml makes no difference.

In Word 2010 I can generate a similar file by saving as "Word XML Document".
Comment 1 Buovjaga 2017-03-11 20:19:44 UTC
Confirmed.

It is: pkg:contentType="application/vnd.openxmlformats-package.relationships+xml"

Arch Linux 64-bit, KDE Plasma 5
Version: 5.4.0.0.alpha0+
Build ID: 43af3605d7e3b372dcc61f9cbc2cabff09396ed5
CPU threads: 8; OS: Linux 4.9; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on March 10th 2016

Arch Linux 64-bit
LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4
Comment 2 QA Administrators 2018-03-12 03:35:50 UTC Comment hidden (obsolete)
Comment 3 sam tygier 2018-03-17 22:21:58 UTC
Still an issue in current master:
Version: 6.1.0.0.alpha0+
Build ID: 5833734027f9194e3433d82a6e8848b64e2ae3b1
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: en-GB (en_GB.utf8); Calc: group
Comment 4 V Stuart Foote 2018-03-18 22:17:23 UTC
Opens using the "Microsoft Word 2003 XML (*.xml, *.doc)" file type filter. You'll need to set the "UseSystemFileDialog" false to conveniently select the filter. 

But, the import filter does not seem to correctly handle all content tags of the source XML content-- the "<w:r><w:t>Test file</w:t></w:r>" is not being picked up...
Comment 5 QA Administrators 2019-03-19 03:49:37 UTC Comment hidden (obsolete)
Comment 6 V Stuart Foote 2019-03-19 04:09:48 UTC
Remains an issue, have to force use of the "Word 2003 XML (.xml, .doc)" filter to open as document into Writer. Otherwise opens as XML text.

Version: 6.3.0.0.alpha0+
Build ID: ce01727e4d6779ea128aa1be09f4af8cad4e1854
CPU threads: 8; OS: Windows 10.0; UI render: GL; VCL: win; 
Locale: en-US (en_US); UI-Language: en-US
Calc: CL
Comment 7 Timur 2019-08-19 11:02:28 UTC
Repro 6.4+
Comment 8 Stéphane Guillou (stragu) 2021-06-29 07:06:11 UTC
Reproduced in:

Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: f446a203fa2897bab8ae7686c948a8bf060675c6
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-06-24_15:16:38
Calc: threaded
Comment 9 Ken Parker 2023-10-20 22:46:40 UTC
We at OpenText have a product, WebReports, with thousands of users producing thousands of these documents per day. We would really appreciate anything that could be done to move this issue up the priority list.
Comment 10 Buovjaga 2023-10-21 04:01:04 UTC
(In reply to Ken Parker from comment #9)
> We at OpenText have a product, WebReports, with thousands of users producing
> thousands of these documents per day. We would really appreciate anything
> that could be done to move this issue up the priority list.

https://www.libreoffice.org/get-help/professional-support/
Comment 11 Justin L 2023-10-23 12:12:33 UTC
Following comment 4's instructions opens up to an empty page. I expect that it fails everything EXCEPT accepting this as a legitimate document. So likely the styles, the settings, the header/footer, and the document itself are completely ignored. Thus it probably is not much different than opening Writer and assigning a file name, which isn't very helpful. To test this theory, I'd suggest adding some SAL_DEBUG statements in writerfilter...DomainMapper.cxx in some properties that exist in styles like 
    <w:spacing w:after="200" w:line="276" w:lineRule="auto"/>
and see if ANYTHING from the XML is loading.

Since likely the XML is completely ignored as unknown, the key to solving this should be to parse the added pkg:part XML commands and use that to direct the rest of the XML parsing into the correct "buckets" of document, styles, settings, header/footer etc.

That would involve diving into the nasty /writerfilter/source/ooxml/model.xml and related functions in OOXMLFastContext files.