Bug 104718 - SAXParseException: Allow to recover XML as much as possible from the document
Summary: SAXParseException: Allow to recover XML as much as possible from the document
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
(earliest affected)
Hardware: All All
: medium enhancement
Assignee: Mike Kaganski
Whiteboard: target:5.4.0
Depends on:
Reported: 2016-12-16 17:36 UTC by Mike Kaganski
Modified: 2017-01-22 05:04 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2016-12-16 17:36:05 UTC
Currently (since 5.0.4), when LO opens an XML format (ODF, OOXML) file with errors (like duplicated attributes, data-past-body etc.) an error "SAXParseException: ..." is returned to user, and the file isn't opened.

This is the result of better error detection/handling introduced by commit ebf767eeb2a169ba533e1b2ffccf16f41d95df35, and allowed us to detect and fix quite a number of errors. However, it is really a problem for end users being unable to open the corrupted files that were possible to be open previously. This leads, e.g., to creations of HOWTOs in Ask Libreoffice, that describe using AOO as correct way to open those files.

I suggest changing current operation logic to be like that:
If SAXParseException was generated during XML parsing, then display something like this to user:
"This file is corrupted (<Here goes current SAXParseException message>). LibreOffice may try to recover as much as possible, but be prepared that some information can be damaged or lost. Do you want to proceed?"

If user says "yes", then parsing continues as it worked before ebf767eeb2a169ba533e1b2ffccf16f41d95df35. Following SAXParseException are handled as if used answered "yes" each time.

If an unrecoverable exception is encountered, then, of course, return message "File cannot be recovered".

I suppose that the nature of the message is clear enough, and users will continue to file reports about such problems (esp. if the problematic file was previously generated using LO).
Comment 1 MM 2016-12-23 19:39:56 UTC
I've seen more reports like that, so it would be nice if LO atleast opens the file and recovers as much as possible.
Comment 2 Mike Kaganski 2017-01-16 18:28:09 UTC
A patch is submitted to gerrit: https://gerrit.libreoffice.org/33181
Comment 3 Commit Notification 2017-01-19 07:02:41 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":


tdf#104718: Prompt user to continue on SAXException

It will be available in 5.4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:

Affected users are encouraged to test the fix and report feedback.