Occasionally corrupted documents, missing text
Steps to reproduce:
1. Edit document that includes html tags or bookmarks
3. Return to document
In one case, added some hyperlinks to .docx. One of the hyperlinks was malformed by LibreOffice (it was correct when I entered it). LibreOffice did not tell me there was a problem. Saved, closed, left it alone, came back. When I tried to move it elsewhere (without opening), OpenSuse told me that there was malformed html in the file and wouldn't do it. When it opened it, the file cut off in the middle of one of the new html tags. LibreOffice returns no errors, just cuts it off. It actually eliminated all of the tag except the final bookmark part of the html page, #Bookmark. MS Word refuses to open the file and returns the message "The name in the end tag of the element must match the element type in the start tag" "Location: Part: /word/document.xml, Line: 2, Column:7625" It's a small file now, so it looks like LibreOffice just chopped off the 2d half of it, which I will say is very frustrating.
In the other case, in a complex .odt document using bookmarks, I pasted a block of encrypted text into the document and when I returned to the document after a save/reopen, the text was cut off halfway through (not good). The rest of the document afterward continued after a page break. So I did it again, pasted in a new block of encrypted text under the last and deleted the partial block of text and lo, the missing text reappeared after deleting the first half! It was in the document but LibreOffice was failing to display it (and I had tried to manually move the cursor through it earlier to see if it was hidden but no luck). Very bad.
In complex documents, I see now that I never can tell if LibreOffice saved the document correctly. When I reopen, a decent chance something is corrupted or missing.
When I save a document, and the document does not return errors during the save, all of the features and text should be saved and not corrupted. Even if they are not later available in Word (which I understand has some compatibility issues) the document should be readable in LibreOffice.
Platform (if different from the browser):
Browser: Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1
Created attachment 67436 [details]
This is a re-enactment of the document contents that resulted in corruption of .docx.
This file was created on Windows/LibreOffice 22.214.171.124. The original text was:
This is the beginning of the document. Rule 8.4 of the Rules of Professional Conduct. This is the rest of the document.
The "Rule 8.4 of the Rules of Professional Conduct" has the following link:
You can see the result in OpenOffice XML is that it just breaks the file without warning, rendering it unusable and corrupting the subsequent text. This sequence saved correctly in .doc and .odt.
Here is the link with proper html formatting
bug also is present on 126.96.36.199.
The 2d half of the file is not missing, just malformed. Was accessible by changing the extension to .zip and opening document.xml in text editor.
The problem here is that LibreOffice saves html tags in an inartful way in .docx files. LibreOffice tries to do everything in the document.xml file. In the example I posted, the link was represented in the document as (forgive me it I mis-crop leading or trailing instructions):
HYPERLINK "http://www.mass.gov/obcbbo/rpc8.htm" \l "Rule 8.4"</w:instrText></w:r><w:r><w:fldChar w:fldCharType="separate"/></w:r><w:r><w:rPr><w:rStyle w:val="style15"/></w:rPr><w:t>Rule 8.4 of the Rules of Professional Conduct</w:t></w:r><w:r><w:fldChar w:fldCharType="end"/></w:r></w:hyperlink>
Something in that could not be parsed either by Word or LibreOffice on reopen.
Word itself does not try to do this in the document.xml file. Instead, it inserts a bookmark with a reference to a different file in the compressed .docx structure:
Word /document.xml :
><w:hyperlink r:id="rId4" w:anchor="Rule 8.4" w:history="1"><w:r><w:rPr><w:rStyle w:val="Hyperlink"/></w:rPr><w:t>Rule 8.4 of the Rules of Professional Conduct</w:t></w:r></w:hyperlink>
Word /_rels/document.xml.rels :
Target="fontTable.xml"/><Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" Target="http://www.mass.gov/obcbbo/rpc8.htm" TargetMode="External"/></Relationships>
LibreOffice does not attempt to use the "rels" folder/functionality in the .docx structure in connection with the hyperlinks. As a result, using html links and bookmarks in LibreOffice with .docx files is a problem waiting to happen.
Created attachment 67463 [details]
This is the same file, but saved by MS Word into .docx format
Compare the treatment of the html tags in this file with the malformed file above. Word put the html tag into a separate "rels" file inside the .docx structure, which avoids whatever problem LibreOffice encountered by putting the entire tag directly into the document.xml.
It happened to me on libreoffice-188.8.131.52-3.fc19
Created attachment 89499 [details]
steps for reproduce
I think I have the same issue. My steps are:
1. create several lines with text
2. in one of the lines add hyperlink e.g “www.link.com ” (with space so as text become a hyperlink)
3. save the document with .docx extension
4. open my document
Result: all lines after hyperlink are dissapear.
Video with steps attached.
LibreOffice Writer Version: 184.108.40.206 Build ID: 410m0(Build:3)
OS: Ubuntu 13.10
I just encountered this serious bug when re-opening a docx document I was working on. All text after the hyperlinks was mysteriously deleted. I noticed that the file size was still very large despite most of the text missing, and then I tried to figure out a way to decode the docx format and recover the data inside the file somehow, and then I learned that docx is just a renamed zip archive in openxml format. After renaming the .docx to be .zip, I was able to open word/document.xml file and see that all the missing text was still there as a plain xml document. I deleted the xml tags related to the hyperlinks, then I zipped the files again and renamed the .zip to .docx. It worked! I was able to restore the hours of lost work. Hope this helps someone fix the bug and prevent losing their work! Perhaps the software is writing an invalid OpenXML syntax or when it reads it back out, it fails to read it correctly.
Just fixed a similarly corrupted .docx file saved by libre office (Version: 220.127.116.11 Build ID: 410m0(Build:2)).
I'm not sure what happened during editing (weren't there then), but somehow the hyperlinking added extraneous <w:hyperlink ...> tags just after a <\w:hyperlink> without actual url and the corresponding closing tag, see in quote:
<w:hyperlink r:id="rId4"><w:r><w:rPr><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman"/><w:sz w:val="22"/><w:szCs w:val="22"/></w:rPr><w:t xml:space="preserve"> Käyttäjälähtöiset innovaatiot toimivat arjessa</w:t></w:r></w:p><w:p><w:pPr><w:pStyle w:val="style31"/><w:tabs><w:tab w:leader="none" w:pos="0" w:val="left"/></w:tabs><w:ind w:hanging="0" w:left="0" w:right="0"/><w:rPr><w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman"/><w:sz w:val="22"/><w:szCs w:val="22"/></w:rPr></w:pPr><w:hyperlink r:id="rId5">
Removing the orphan tags seems to make everything visible again in libreoffice.
I'll attach the fixed file and the original faulty document.xml, for diffing.
Created attachment 95316 [details]
Faulty document.xml with orphaned w:hyperlink tags
Created attachment 95317 [details]
Repackaged with orphaned w:hyperlink tags removed (see above), works.
(In reply to comment #11)
> Created attachment 95316 [details]
> Faulty document.xml with orphaned w:hyperlink tags
I've had this problem only for the last few months. I tried the solution as you have it and found that it worked. That is an amazing piece of detective work; I knew about the structure of docx and the existence of document.xml, but I don't think I would ever have been able to figure out what the issue was. Fantastic work; well done.
This bug is still there in LibreOffice 18.104.22.168. It was a huge shock to find that all my text had vanished. The workaround posted by Bruce Kirkbatrick and Elmo worked (thanks a lot for that!), but most non-technical users would not be able to follow the steps required to recover their data. Any chance that this severe bug will be fixed soon? If not, could LibreOffice at least issue a warning when the user adds hyperlinks to a docx file?
Tested again in LO Version: 22.214.171.124 Build ID: 40m0(Build:2) (OpenSuse 13.2) with html links in original report, now it WORKSFORME. I do not know the commit, but I'll say FIXED for now.
I'm seeing the same symptoms, can someone help me verify if it's the same cause?
xmlstarlet val document.xml says it's well-formed, which would not be the case if there were orphan tags.
I'm having trouble verifying the cause, since the document.xml is > 3 MB in one line, and no editor seems to work on this.
When opening the file, it ends in the mid of a sentence, where the XML doesn't even open a tag or anything.
I'm using LibreOffice Build-ID: 126.96.36.199-8.fc21.
I'm experiencing this bug in 188.8.131.52 on Ubuntu 16.04
(In reply to ELind77 from comment #18)
> I'm experiencing this bug in 184.108.40.206 on Ubuntu 16.04
I tested the link from comment 2, resaving attachment 67463 [details] from comment 5, and repeated the video steps from comment 7 and no issues with LibreOffice 220.127.116.11, so likely you have a document you saved when this bug was an issue that needs fixing. Try using the steps in comment 9 and comment 10 to fix the issue. If you are unable to do so yourself, you can email me your document and i'll fix it for you.
(In reply to Andy Pillip from comment #16)
> I'm having trouble verifying the cause, since the document.xml is > 3 MB in
> one line, and no editor seems to work on this.
I use tidy < http://tidy.sourceforge.net/ > to convert the single line into multiple lines with indenting with this command '$ tidy -i -xml -raw -w 0 document.xml > document1.xml'. If the document1.xml is blank then xml isnt well formatted.
> When opening the file, it ends in the mid of a sentence, where the XML
> doesn't even open a tag or anything.
Hopefully document.xml hasnt sustained any data loss and if so, fixing the issue shouldnt be to difficult.
So for anyone who has a corrupted file, attach it to the bug report and i'll attempt to fix it for you.