Bug 97412 - Writer should repair some corrupt docx files (as MSO)
Summary: Writer should repair some corrupt docx files (as MSO)
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
3.5.0 release
Hardware: All All
: low enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:docx
Depends on:
Blocks: DOCX
  Show dependency treegraph
 
Reported: 2016-01-28 12:40 UTC by Guillermo Reisch
Modified: 2021-05-28 11:01 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
the file is corrupted but reparable (1.31 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2016-09-14 00:00 UTC, Guillermo Reisch
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Guillermo Reisch 2016-01-28 12:40:44 UTC
If part of the docx file is corrupted (altered or chopped) then writer can't open the file and can't repair the file.

If file is fixed manually:
# mkdir tmp
# cd tmp
# unzip file_corrupt.docx
# zip -9r ../file_fixed.docx *

Then writer opens file_fixed.docx (works, at least the most times)
Why? Because lots of "docx" files contains images or not crucial information that can be "fixed" (a glitch is inevitable...) and get the file work again!

Somebody give me a docx that was corrupted and i restored the file using this method ; Should be great if libreoffice check the file for corrupt and then try to restore the file (even if some part get glitched) ; Show "Warning! Some information should be inevitable lost" message will rock!

For the problem file the command "unzip -t file.docx" give me this error:
....
    testing: word/embeddings/oleObject1.bin  
  error:  invalid compressed data to inflate
file #9:  bad zipfile offset (local header sig):  746694
  (attempting to re-compensate)
......
And continue correctly all others files in file.docx

Reproduce Error:

 * Create a docx whit some text and image
 * Using a Hex editor change some part of the file (string of 20 bytes or more)
 * Open libreoffice writer and crash!

 * Restore the file using the above method
 * Open libreoffice writer and works (the image get glitched, but you get all the information again, replace the image and go!)


I found another related error! If you change Only 1 byte in docx file using a hex editor then the zip file get "slightly" corrupted whit this message in the test in the unzip command:
# unzip -t file.docx
      ....
      testing: word/media/image1.jpeg   bad CRC f13f46e1  (should be b4564b78)
      ....
If you open with Writer then NO MESSAGE IS DISPLAYED, NO WARNING! NOTHING IS SHOW TO THE USER ; NO REPAIR PROCESS IS STARTED!
Instead its open the file like is should be all "OK?" but you see the image is glitched (not all "OK!?").

I'am using:
 * Linux, Debian (unstable)
 * libreoffice-writer      1:5.0.5~rc1-1
 * libreoffice             1:5.0.5~rc1-1

Sory my Bad Ingles :-P
Comment 1 Buovjaga 2016-01-28 18:33:55 UTC
Would be great, if you attached such an example here so testers can quickly test.

Set to NEEDINFO.
Change back to UNCONFIRMED after you have provided the document.
Comment 2 Xisco Faulí 2016-09-11 21:52:52 UTC Comment hidden (obsolete)
Comment 3 Guillermo Reisch 2016-09-14 00:00:36 UTC
Created attachment 127320 [details]
the file is corrupted but reparable

spent some time; sorry for the delay
For this attached file I generated the bug.
The file can be repaired!
unzip => zip => fixed!

the file docx is corrupt in the picture! ( upper left corner ; see a little "line" )

I wanted to create an example but it is difficult, so sending the original .... 

Update: problem present in new version 1:5.2.0~rc3-1 (Debian testing)
Comment 4 Buovjaga 2016-09-15 09:58:23 UTC
I confirm Writer cannot repair it.

Win 7 Pro 64-bit Version: 5.3.0.0.alpha0+
Build ID: ba269f7294e2416659011cbb498a2c6b5f9d5199
CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2016-09-12_02:36:16
Locale: fi-FI (fi_FI); Calc: CL

LibreOffice 3.5.0rc3 
Build ID: 7e68ba2-a744ebf-1f241b7-c506db1-7d53735
Comment 5 Telesto 2016-12-10 13:54:32 UTC
Repro with:
Version: 5.4.0.0.alpha0+
Build ID: b894104a0b02a9b074c76feb925389d7bee6a493
CPU Threads: 4; OS Version: Windows 6.19; UI Render: default; 
TinderBox: Win-x86@39, Branch:master, Time: 2016-12-10_01:00:52
Locale: nl-NL (nl_NL); Calc: CL
Comment 6 QA Administrators 2017-12-11 08:53:47 UTC Comment hidden (obsolete)
Comment 7 eamonfitzpatrick 2018-06-06 00:18:04 UTC
I would like to add that I have been getting a number of similar errors

We are getting a number form letter style docx documents that appear to be generated by an earlier version of libreoffice to create form letters via a large government agency (ofsted).

the app.xml shows the following

<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes"><Template></Template><TotalTime>0</TotalTime><Application>LibreOffice/5.1.5.2$Windows_x86 LibreOffice_project/7a864d8825610a8c07cfc3bc01dd4fce6a9447e5</Application></Properties>

unzipping and reziping the file fixes the problem, we normally just open in another program.

the problem appears to affects openoffice and libreoffice 3.1-6.1

Word can open the files, wordonline can't. Pagis can't, WPS and Softmaker can.

I thought it would be worthwhile posting as I unfortunatly can't share the documents due to confidentiality, but it may be a warning of an increasing frequency of this issue caused by some product on the market creating out of specification archives.
Comment 8 QA Administrators 2019-06-07 02:52:59 UTC Comment hidden (obsolete)
Comment 9 Guillermo Reisch 2019-06-18 17:40:33 UTC
I confirm Writer cannot repair it.

Linux goku 4.19.0-5-amd64 #1 SMP Debian 4.19.37-3 (2019-05-15) x86_64 GNU/Linux
Debian SID ; libreoffice* 1:6.1.5-3 ; ure 6.1.5-3 ; LANG=es_UY.UTF-8 ; Intel(R) Core(TM) i5-4440 CPU @ 3.10GHz

lowering Importance... (minor)