Bug 46060 - Writer corrupts DOCX file containing TOC, recovery not possible with Office 2010
Summary: Writer corrupts DOCX file containing TOC, recovery not possible with Office 2010
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.5.3 release
Hardware: x86-64 (AMD64) All
: high critical
Assignee: Miklos Vajna
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: mab3.6
  Show dependency treegraph
 
Reported: 2012-02-14 12:51 UTC by Philip Gillißen
Modified: 2013-08-27 14:17 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments
The clean document (12.34 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-02-14 12:51 UTC, Philip Gillißen
Details
The corrupt document (9.33 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-02-14 12:51 UTC, Philip Gillißen
Details
Clean document with TOC (16.02 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-02-15 10:34 UTC, Philip Gillißen
Details
Corrupt document with TOC (4.84 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-02-15 10:35 UTC, Philip Gillißen
Details
Original clean document (862.26 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-09-01 15:09 UTC, yves
Details
Corrupted version (16.83 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-09-01 15:10 UTC, yves
Details
New test with Yves' document and master sources updated today (24.89 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2013-04-03 20:50 UTC, Julien Nabet
Details
console logs with master sources (41.28 KB, text/plain)
2013-04-03 20:51 UTC, Julien Nabet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Philip Gillißen 2012-02-14 12:51:00 UTC
Created attachment 57053 [details]
The clean document

LibreOffice 3.5 corrupts my DOCX files completely, so that Microsoft Word 2010 cannot open it anymore!

I tried it with simple documents and it failed completely. Word doesn't even offer to recover the document, it is just unable to open it. I think this is a huge issue for most users of LibreOffice.

On Ubuntu 64 bit, with LibreOffice 3.4, the documents were only corrupted, so that Word had to recover it (with lots of lost formatting...) but it could open the file.

I experienced a similar behaviour with xlsx, too.

I don't know, why the importer for docx/xlsx is getting worse. I hope, you guys can use this information to improve LO.

I attached one clean document and the result as I saved it with LO 3.5 on Windows. The corrupt one is not recoverable under MS Office 2010.
Comment 1 Philip Gillißen 2012-02-14 12:51:27 UTC
Created attachment 57054 [details]
The corrupt document
Comment 2 Pedro 2012-02-14 15:06:24 UTC
The document you attached is a (perfectly good) ODT file with a wrong .DOCX extension.

Can you describe the exact steps that caused to create this file?

Thanks!
Comment 3 Philip Gillißen 2012-02-15 02:32:33 UTC
I created the clean document on Windows 64 Bit with Microsoft Office 2010. I saved it and then opened it in LibreOffice 3.5 (also on Windows 64 Bit, but it's the LO 32 bit version).
Then I removed a blank and added it and saved the changed document under a new file name, but with .docx suffix.

Then I tried to open this new docx file with Word and it says it tries to recover it, but it fails. The document is not readable with Word.

I hope this is enough information. If you need more information, just ask.
Comment 4 Nino 2012-02-15 05:04:18 UTC
Though I have no windows for testing, I'm seemingly able to produce a file in docx format (i.e. exactly like the clean document) when saving the clean document as described in the report:

1. open clean.docx
2. modify it (add a word)
3. save by pressing save button
4. confirm fileformat dialog

So all works as expected on linux rpm-32 LibO 3.5.0rc3. 

@ Philip: 
Seems as if "saving" the document did not produce a document of type "MS XML" but rather of type "ODT". Adding the file extension ".docx" does not automatically change the document type. To be able to tell, if this was caused by a software misbehavior (bug) or by a somewhat "not-as-designed" user action, the question is, how the "corrupted document" was created exactly and if this behavior can be reproduced (and under which conditions, i.e. OS/platform/etc). So you should give exact step-by-step instructions, what you did (which button you pressed, and so on) to save the document. Then somebody (using the same Platform if available) should try to reproduce the behavior.
Comment 5 Philip Gillißen 2012-02-15 06:41:16 UTC
Hi Nino!

(In reply to comment #4)
> @ Philip: 
> Seems as if "saving" the document did not produce a document of type "MS XML" but rather of type "ODT". Adding the file extension ".docx" does not
> automatically change the document type. To be able to tell, if this was caused by a software misbehavior (bug) or by a somewhat "not-as-designed" user action,
> the question is, how the "corrupted document" was created exactly and if this behavior can be reproduced (and under which conditions, i.e. OS/platform/etc). So
> you should give exact step-by-step instructions, what you did (which button you pressed, and so on) to save the document. Then somebody (using the same
> Platform if available) should try to reproduce the behavior.

Thank you for the test and the clarification! I will try it if I did use the wrong format and report back (propably this evening).
Comment 6 Philip Gillißen 2012-02-15 10:34:37 UTC
Created attachment 57110 [details]
Clean document with TOC
Comment 7 Philip Gillißen 2012-02-15 10:35:38 UTC
Created attachment 57111 [details]
Corrupt document with TOC
Comment 8 Philip Gillißen 2012-02-15 10:40:45 UTC
I reanalyzed my problem and found the real problem corrupting the files.

The document is unrecoverably corrupted by LO 3.5, so that MS Word 2010 is not able to open the documents.

The source is the TOC, added by Word. I added two new documents (1 clean, 1 corrupt).
Here's the procedure, how I created them:

1. Create new document in MS Word 2010
2. Insert TOC (standard template)
3. Add some headlines
4. Refresh TOC
5. Save document as docx file.
6. Copy document with Windows Explorer.
7. Open copy with LibreOffice Writer 3.5.
8. Add a line break
9. Save it in the current format.
10. Open it in Word. Not recoverable (problem in Line 2, document.xml).

I compared the contents with WinMerge[1] and the docx addon[2]. I found the differences in the TOC area.

I really hope, this information may help you. I will edit the bug title according to the new information.

[1]: http://winmerge.org/
[2]: http://freemind.s57.xrea.com/xdocdiffPlugin/en/index.html
Comment 9 sasha.libreoffice 2012-02-22 06:25:35 UTC
reproduced in 3.6.0 master 97fdf02-9eed775-f061262 on Fedora 64 bit
problem with import/export of table of content as described here:
Bug 46025 - Writer FILESAVE, FILEOPEN: docx files with "Content" field processed worng

@ Philip Gillißen
Thanks for bugreport.
Comment 10 Simon-Shlomo Poil 2012-07-17 15:16:26 UTC
I can confirm this bug also appears on Windows XP 64bit, with latest version of  LibreOffice 3.5.5.3.

The file does not open in Word, and when opening in Libreoffice half of the original document is lost! 

This is very serious bug, that leads to data loss! I changed the importance level accordingly.
Comment 11 dE 2012-08-28 14:33:57 UTC
I confirm this bug.

Serious indeed.
Comment 12 yves 2012-09-01 15:08:15 UTC
I'm having a very similar problem but without any TOC

Here is a clean .docx file and the corrupt version that has only been opened and "saved as" in LibreOffice.

This bug does not happen with every document (I have other examples of work/don't work .docx files) and save as in .doc has "resolved' the problem both times it appeared.

LO 3.5.6.2
Mac OS X 10.8.1
Comment 13 yves 2012-09-01 15:09:44 UTC
Created attachment 66448 [details]
Original clean document
Comment 14 yves 2012-09-01 15:10:08 UTC
Created attachment 66449 [details]
Corrupted version
Comment 15 yves 2012-09-05 15:31:50 UTC
Same results with LO 3.6.1 en-US (previous was fr)
Comment 16 Philip Gillißen 2012-10-06 13:27:03 UTC
Is there an update on this issue?
I think it's quite critical.
Comment 17 Pierre 2012-11-02 00:23:41 UTC
This also happens for LibreOffice 3.5.4.2 (build ID 350m1(Build:2) ) when I just add a table. If there's only a regular line of text, the saving to .docx works ok, but as soon as I add a 2-row, 2-column table (empty), the file becomes unreadable in Office 2007.
Comment 18 Mike Sapsard 2013-03-25 07:33:39 UTC
This is still happening in Version 4.0.1.2 with Windows 8.
Comment 19 Julien Nabet 2013-04-03 20:50:48 UTC
Created attachment 77391 [details]
New test with Yves' document and master sources updated today

On pc Debian x86-64 with master sources updated today, I retrieved the clean Yves' file and did a "save as" with a new name. I attached the file.
I don't have MsOffice 2010 to test it but will attach console logs which show some problems.
Comment 20 Julien Nabet 2013-04-03 20:51:51 UTC
Created attachment 77392 [details]
console logs with master sources

Following my previous comment, I attached console logs showing the messages during original file opening and during save as.
Comment 21 Julien Nabet 2013-04-03 20:53:32 UTC
could someone tell if https://bugs.freedesktop.org/attachment.cgi?id=77391 can be opened with MsOffice 2010?
Comment 22 sasha.libreoffice 2013-04-04 09:14:58 UTC
Opening 7503_testwithmaster.docx by msWord 2007 and 2010:
error opening /word/document.xml line 2 column 104255
Comment 23 Philip Gillißen 2013-04-04 16:07:27 UTC
(In reply to comment #21)
> could someone tell if https://bugs.freedesktop.org/attachment.cgi?id=77391
> can be opened with MsOffice 2010?

Yes, still causing the error for Office 2010 on Windows 7 (32bit). Get the same error message as Sarah.
Comment 24 Philip Gillißen 2013-04-04 16:08:01 UTC
Sorry! I meant Sasha.
Comment 25 Mike Sapsard 2013-04-06 09:20:52 UTC
I had this error from MS Word2010/Windows 7 Home Premium - 64 bit.
------------------------------------
The file 7503_testwithmaster.docx cannot be opened because there are problems with the contents.

Details
A document must contain exactly one root element.
Location Part: /word/document.xml, line: 2, Column: 104255
------------------------------------

The file was not displayed.

LO 4.0.1 also failed to open it.
Comment 26 sasha.libreoffice 2013-04-06 09:31:06 UTC
and LO 4.0.2 on Fedora 64 bit crashes during opening this file
Comment 27 Jose Marcado 2013-04-26 11:37:58 UTC
When nothing else helps: Try to download SoftMaker FreeOffice (57MB, free office suite), it has fantastic import and export filters for all Microsoft Office formats. Never had conversion problems with FreeOffice, but regularly with LibreOffice. freeoffice.com
Comment 28 Michael Meeks 2013-08-26 15:12:59 UTC
Let me add Miklos to see what happens ;-) it sounds bad.
Comment 29 Miklos Vajna 2013-08-27 11:05:20 UTC
When saving attachement 66448 to DOCX, I document.xml indeed isn't well-formed on -4-1. It's fine on master, I'll check what should we backport here.
Comment 30 Miklos Vajna 2013-08-27 12:58:46 UTC
http://cgit.freedesktop.org/libreoffice/core/commit/?h=libreoffice-4-1&id=59d8dde3fc9a4dc653e43efb8552efc4ab3efc92 should take care of the well-formed output problem on -4-1
Comment 31 Philip Gillißen 2013-08-27 14:17:34 UTC
Hi Miklos!

(In reply to comment #30)
> http://cgit.freedesktop.org/libreoffice/core/commit/?h=libreoffice-4-
> 1&id=59d8dde3fc9a4dc653e43efb8552efc4ab3efc92 should take care of the
> well-formed output problem on -4-1

Thank you for the fix!
I tested your patch with the current daily build (http://dev-builds.libreoffice.org/daily/libreoffice-4-1/Win-x86@6-debug/2013-08-27_09.28.10/) and the edited document is not corrupt and can be opened by Word without any complaint.
Thank you very much for your work, this is great!