Bug 131856 - Read Error. Format error discovered in the file in sub-document content.xml at 2,3810925(row,col).
Summary: Read Error. Format error discovered in the file in sub-document content.xml a...
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.3.5.2 release
Hardware: Other Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-03 17:25 UTC by Hugh Hyatt
Modified: 2020-04-04 00:58 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hugh Hyatt 2020-04-03 17:25:24 UTC
Description:
File will not open. I un-archived the file and found content.xml, which consists of 2 lines and 15,675,122 bytes. Here's some of what surrounds the referenced position in the file (line 2, columns 3810868-3811023):

<table:table-cell table:style-name="ce529"/><table:table-cell table:number-columns-repeated="53"/></table:table-row><table:table-row table:style-name="ro2">

The 3,810,925 character is the "c" in the 2nd "table-cell" above.

Steps to Reproduce:
Unknown. So far I've only ever seen anything like this problem with this particular version of this .ods file.

Actual Results:
Unable to load file.

Expected Results:
N/A


Reproducible: Couldn't Reproduce


User Profile Reset: Yes


OpenGL enabled: Yes

Additional Info:
Version: 6.3.5.2
Build ID: dd0751754f11728f69b42ee2af66670068624673
CPU threads: 4; OS: Linux 4.4; UI render: default; VCL: gtk2; 
Locale: en-US (en_US.UTF-8); UI-Language: en-US
Calc: threaded

Same with 
Version: 6.3.6.0.0+
Build ID: 6677c1e6aa3465bc4eb39897447391ac1ac0a0eb
CPU threads: 4; OS: Linux 4.4; UI render: default; VCL: gtk2; 
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:libreoffice-6-3, Time: 2020-04-01_10:00:01
Locale: en-US (en_US.UTF-8); UI-Language: en-US
Calc: threaded

I update this file every day. It contains critical financial information. The only .bak version of the file I could find is from almost 3 months ago.
Comment 1 Julien Nabet 2020-04-03 19:15:07 UTC
Before doing anything, make sure you have a backup of your initial ods file.

2 tests you can do:
1) you can remove table:style-name="ce529" from this tag and repackage the ods file then see if you have another error

2) To better see the pb, you can reformat xml with a command like:
cat content.xml | tidy -utf8 -xml -w 255 -i -c -q -asxml > content_new.xml

then you can analyze content_new.xml more easily.

You can also replace content.xml by content_new.xml then rename content_new.xml to content.xml, rezip the whole structure and rename it to have an ods file.
Comment 2 Hugh Hyatt 2020-04-03 20:52:54 UTC
I tried suggestion #2 first and got the following results:

hugh@PBL20:~/tmp/LibreOffice Read Error$ cat content.xml | tidy -utf8 -xml -w 255 -i -c -q -asxml > content\ -\ formatted.xml
line 2 column 4160374 - Error: unexpected </table:tabne-cell> in <table:table-cell>
line 2 column 4161545 - Error: unexpected </table:table-row> in <table:table-cell>
line 2 column 4295367 - Error: unexpected </table:table> in <table:table-cell>
line 2 column 15666771 - Error: unexpected </office:spreadsheet> in <table:table-cell>
line 2 column 15666792 - Error: unexpected </office:body> in <table:table-cell>
line 2 column 15666806 - Error: unexpected </office:document-content> in <table:table-cell>
hugh@PBL20:~/tmp/LibreOffice Read Error$

I saw what seemed like an obvious error in the first message--spelling "table" as "tabne". I made the change fix it, zipped all the component files and all seems to be good now.

It looks like somewhere in the code that generates content.xml, there is a typo that needs to be fixed.
Comment 3 Julien Nabet 2020-04-03 21:41:45 UTC
I runned "git grep -n tabne", it's nowhere to be seen. Perhaps an old bug which has been fixed.

If this file is very important, I advise you to make a backup at least once per day.
Then you can keep the 7 last ones for examples or 3 last ones + 1 every month during one year, you decide your backup mechanism.
You can rename your file like:
<initial_file_name>_20200403.ods for example.

Anyway, let's put this one as WFM now.
Comment 4 Hugh Hyatt 2020-04-04 00:58:28 UTC
Backup-ups already instituted. I thought I had them, but apparently never actually checked. Thanks for your prompt and very helpful responses!