Bug 99686 - Implement a recovery utility to clear "<office:automatic-styles>" from corrupted content.xml in ODF archives
Summary: Implement a recovery utility to clear "<office:automatic-styles>" from corrup...
Status: RESOLVED WONTFIX
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsDevEval
Depends on:
Blocks:
 
Reported: 2016-05-04 22:40 UTC by Shunesburg69
Modified: 2018-01-31 09:47 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Corrupted file (38.43 KB, application/vnd.oasis.opendocument.text)
2016-05-04 22:40 UTC, Shunesburg69
Details
repaired with tidy (38.79 KB, application/vnd.oasis.opendocument.text)
2016-05-08 18:29 UTC, Maxim Monastirsky
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Shunesburg69 2016-05-04 22:40:45 UTC
Created attachment 124848 [details]
Corrupted file

I help sometimes people on a French forum (I'm French), and I see often people speak about corrupted files. The content.xml are corrupted by unknown issue. The only way to fix that is like that:

First, let's call our nonopening ODT file as "bad.odt".
1- make backup FIRST -> "$ cp bad.odt bad_original.odt"
2- make new directory-> "$ mkdir repair"
3- copy bad.odt to repair directorty "$ cp bad.odt repair"
4- change default directory to repair -> "$ cd repair"
5- unzip bad.odt -> "$ unzip bad.odt"
6- after unzipping you get bunch of files and directory's under repair, find content.xml and open it whit your favorite text editor -> "$ kate content.xml"
7- use "find" function to find out, if you have XML tag "<office:automatic-styles>" (somewhere at the beginning of document) and XML tag "</office:automatic-styles>" (somewhere, middle of document). If you have, then delete them and all data between them. Be sure, that you don't delete more or less!
8- save content.xml (keep original name and place!)
9- zip extracted data back to one ODT document -> "$ zip -r ./bad_repaired.odt ./*"
10- try to open repaired document -> "$ ooffice ./bad_repaired.odt"

Solution find here: https://forum.openoffice.org/en/forum/viewtopic.php?t=1532

There is a possibility, when the soft detect an error in the content.xml to propose to fix it like that by itself ?

I put a document corrupted in attachment.
Comment 1 V Stuart Foote 2016-05-05 01:46:33 UTC
Needs to be proven this is an effective and useful utility, but if so seems reasonable.  Can imagine a number of "repair" utilities for correcting corrupt ODF that could be provided from the GUI

Tools -> ODF -> Repairs
Tools -> ODF -> Validation
Tools -> ODF -> Conversion
Comment 2 Shunesburg69 2016-05-05 20:41:58 UTC
I don't ask for utilities but only an proposition when the error message said that because this message is only for this problem not for other corruption.
The error message is:
Read-Error.
Format error discovered in the file in sub-document content.xml at 2,78898(row,col).

The last line for row and column is different between a file and another, but the error is always fix with the above solution, the only thing I propose is to put the proposition to repair in the box message.
Comment 3 Maxim Monastirsky 2016-05-05 21:05:19 UTC
(In reply to shunesburg69 from comment #0)
> 7- use "find" function to find out, if you have XML tag
> "<office:automatic-styles>" (somewhere at the beginning of document) and XML
> tag "</office:automatic-styles>" (somewhere, middle of document). If you
> have, then delete them and all data between them. Be sure, that you don't
> delete more or less!

> There is a possibility, when the soft detect an error in the content.xml to
> propose to fix it like that by itself ?

But by removing styles completely you will lose all formatting! Why not just remove the duplicate attributes? You can even do that automatically with tools like tidy. In addition - this only works if the problem is in styles section, but what if there is some problem in the document contents? So this is a bad idea in general, and I don't think that we should suggest such wrong things to users.

> I put a document corrupted in attachment.
BTW the corruption in this document is the one that was fixed in Bug 96147.
Comment 4 Shunesburg69 2016-05-08 17:27:40 UTC
No, the formatting stay here just the automatic styles are removed, but the major part of files don't change at all after the repair process.
Comment 5 Maxim Monastirsky 2016-05-08 18:29:23 UTC
Created attachment 124912 [details]
repaired with tidy

(In reply to shunesburg69 from comment #4)
> No, the formatting stay here just the automatic styles are removed,
But most of the formatting is stored in the automatic styles. I'm attaching a file repaired with tidy (by simply running "tidy -m -xml content.xml") - compare it with the same file repaired with your method. It's hard to not notice the difference...

> but the
> major part of files don't change at all after the repair process.
And yet - why _remove_ valuable data, when it can be easily repaired?
Comment 6 Shunesburg69 2016-05-08 20:45:50 UTC
(In reply to Maxim Monastirsky from comment #5)
> And yet - why _remove_ valuable data, when it can be easily repaired?

I just propose what I try but if you have a better way, I'm ok.
Comment 7 Heiko Tietze 2018-01-31 09:47:56 UTC
Maxim's proposal solves issues more cautious. But whether to tidy or to delete the question boils down to an integrated repair function. And we better ensure that those trouble not happens - until then it's better realized per extension.