Bug 140018 - Export: Input list with empty item makes DOCX that can't be opened in Word - Generates XML parsing error
Summary: Export: Input list with empty item makes DOCX that can't be opened in Word - ...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.3.0 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:docx
Depends on:
Blocks: DOCX-Corrupted
  Show dependency treegraph
 
Reported: 2021-01-30 11:34 UTC by Mike Kaganski
Modified: 2022-09-06 17:35 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
A simple list with an empty element (8.20 KB, application/vnd.oasis.opendocument.text)
2021-01-30 11:34 UTC, Mike Kaganski
Details
Image of window error message pop-up when opening .DOCX file (4.74 KB, image/png)
2021-01-30 13:41 UTC, Dave Potter
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2021-01-30 11:34:45 UTC
Created attachment 169296 [details]
A simple list with an empty element

Attachment contains an input list field with two elements, second of which is empty. Saving it as DOCX makes the resulting file show an error at opening in Word 2016:

> Word experienced an error trying to open the file.
> Please try the following suggestions.

Reproducible with current master, and also with LO 3.3 (OOo can't export to DOCX).
Comment 1 Dave Potter 2021-01-30 13:41:37 UTC
Created attachment 169297 [details]
Image of window error message pop-up when opening .DOCX file
Comment 2 Dave Potter 2021-01-30 13:45:05 UTC
Reproduced by saving the file attached in the original bug report as a .DOCX file in Ubuntu 20.04.1
using Writer
Version: 6.4.6.2
Build ID: 1:6.4.6-0ubuntu0.20.04.1

then opening the saved .DOCX file in Windows 10
using Writer
Version 7.0.4.2
Build: dcf040e67528d9187c66b2379df5ea4407429775
Comment 3 Dave Potter 2021-01-30 13:54:11 UTC
Could you provide the exact steps you used to create the document please?
Comment 4 Mike Kaganski 2021-01-30 14:37:22 UTC
(In reply to Dave Potter from comment #3)
> Could you provide the exact steps you used to create the document please?

Definitely.
1. Receive a broken DOCX document from a user privately.
2. Start cleaning its XML to nail down the problem.
3. Find out that the field is the culprit, and that removing the empty item fixes the problem.
4. Save the original broken document to ODF, removing everything except the problematic field ;-P

So no, I don't know how it was created initially. Maybe it was created elsewhere, e.g. in Word, and then re-saved in LibreOffice. The problem is that LibreOffice saves invalid XML (or at least unreadable by Word); LibreOffice could of course consider the empty entry invalid (and drop on import), or do something else :-D
Comment 5 Dave Potter 2021-01-31 14:02:59 UTC
Hi Mike
I can reproduce the symptoms of the bug report by saving the attached odt file as a docx and trying to open it in microsoft word. 
What I can't do is reproduce the file that is attached.
Writer will not allow me to add a completely empty input field in a list and if I add a field that has a space in it the problem does not reoccur.
Have you any idea how this file was originally produced?
If not, then I suggest that the bug should be closed as the underlying scenario cannot be reproduced.
Comment 6 Mike Kaganski 2021-01-31 15:03:12 UTC
(In reply to Dave Potter from comment #5)

Unfortunately, this is not that simple. The problem arose in a real life. I can't find out the steps to repro the problematic document from scratch, but it doesn't mean that it should be closed. There might be some sequence in LO, or there might be some way to create that in other applications (that we don't control). The bugdoc file is a valid ODF, and thus if some (external?) entity is able to produce it, it is expected that LO does not misbehave when processing such files. If it has such input, and exports it to DOCX, it must produce a valid output.
Comment 7 Dave Potter 2021-01-31 17:01:06 UTC
Understood Mike and in principal I agree.  However, what if the odt was generated in error by an old version of LO that had a bug in it that has now been corrected and hence the scenario will not reoccur. Or, does the bug need to be addressed from a legacy compatibility perspective regardless?