Bug 52477 - Editing link with "track changes" enabled causes problem in XML of .docx format
Summary: Editing link with "track changes" enabled causes problem in XML of .docx format
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.4 release
Hardware: All All
: medium normal
Assignee: Not Assigned
QA Contact:
URL:
Whiteboard: target:4.0 bibisected40 target:3.6.5
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-25 10:44 UTC by Danna Gifford
Modified: 2012-12-17 17:19 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Test document containing links (10.66 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-07-25 10:44 UTC, Danna Gifford
Details
Truncated file after editing link with track changes enabled (10.77 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-07-25 10:46 UTC, Danna Gifford
Details
reverse bibisect to find fix (1.34 KB, text/plain)
2012-12-17 16:55 UTC, Björn Michaelsen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Danna Gifford 2012-07-25 10:44:41 UTC
Created attachment 64661 [details]
Test document containing links

LibreOffice version: LibreOffice 3.5.4.2 Build ID: 350m1(Build:2)
OS: Ubuntu 12.04 64 bit

Editing the text of a link while "track changes" is enabled causes a problem in the document.xml of a .docx file format.  What happens is that the XML open and close tags for the edit get placed around the close tag for the link, for example:

<w:hyperlink w:anchor="name">
<<<other tags>>>
<w: del w:author="my name">
</w:hyperlink>
<<<more tags>>>
</w:del>

In LibreOffice, this causes the document to be truncated at the point where the track-changes-edit was made.  The document fails to open in Microsoft Office.

If your document gets truncated in this manner, it is possible to recover the text by taking the document.xml file out of the .docx archive and editing the XML file so that the edit tags no longer enclose the link end tag.

<w:hyperlink w:anchor="name">
<<<other tags>>>
**</w:hyperlink>
**<w: del w:author="my name">
<<<more tags>>>
</w:del>

However, if you re-save the file as a .docx, the problem emerges again.

Steps to reproduce:
1. Open attached file bug_test.docx
2. Turn on "track changes"
3. Delete all the text between two links, e.g.
") and harmful as (food-borne) pathogens ("
4. Save the document as bug_test2.docx
5. Close the document
6. Open bug_test2.docx; file is truncated at point where edit was made.

Work-arounds:
1. Save as another format
2. Don't use track changes (However, this is indispensable for collaborations)
3. Don't use links within your document
Comment 1 Danna Gifford 2012-07-25 10:46:18 UTC
Created attachment 64664 [details]
Truncated file after editing link with track changes enabled
Comment 2 mycae 2012-08-27 12:18:04 UTC
"Me too" : Windows 7, LibreOffice 3.5.5.3 
Build ID: 7122e39-92ed229-498d286-15e43b4-d70da21

Attempting to open this with word 2010 results in the following message:
"the name in the end tag of the element must match the element type in the start tag error", as a result of the misordered XML tags. 

xmllint can be used to find the offending XML tags within document.xml, even without using a schema.

This bug is quite problematic, as it effectively destroys document content for users.
Comment 3 gabzana 2012-12-14 22:41:45 UTC
LibreOffice 3.5.4.2 Build ID: 350m
Ubuntu 12.04 32bit

I can confirm this bug. It is fatal to .docx files being edited with LibreOffice, which after being saved will reopen neither in LibreOffice nor Microsoft Word.

In my case I made a tracked change, an insertion, directly after hyperlinked text (not within the hyperlinked text). It resulted in the following jumbled invalid xml when saved as .docx:

<w:hyperlink r:id="rId11">
<w:r>...SOME TEXT IN HERE...</w:r>
<w:ins w:author="myname" w:date="2012-12-14T11:30:00Z" w:id="1177">
</w:hyperlink>
<w:r>...TEXT INSERTED WITH TRACK CHANGES...</w:r>
</w:ins>

The workaround is to edit the document.xml file inside the .docx archive and correct the xml, but very few users will have the skills to do this. Even for those who know what to do, it can be very difficult to find and display the problematic code (I had to use a hex editor as both gedit and bluefish choked on the 600KB document.xml file).

After manually correcting the xml to the following I was able to open the document in Microsoft Word.

<w:hyperlink r:id="rId11">
<w:r>...SOME TEXT IN HERE...</w:r>
</w:hyperlink>
<w:ins w:author="myname" w:date="2012-12-14T11:30:00Z" w:id="1177">
<w:r>...TEXT INSERTED WITH TRACK CHANGES...</w:r>
</w:ins>
Comment 4 Björn Michaelsen 2012-12-17 15:26:55 UTC
I can reproduce this with the steps given in bibisect40 oldest (source-hash d6cde02dbce8c28c6af836e2dc1120f8a6ef9932 -- pre 3.5.0)
I can NOT reproduce this in bibisect40 latest (source-hash 8450a99c744e9005f19173e4df35d65640bcf5c4 -- around 4.0 beta1), so it seems to be fixed in the meantime. I did not bother to check when exactly the fix happened. Please verify with the 4.0 beta1.
Comment 5 Björn Michaelsen 2012-12-17 15:27:57 UTC
has been reported on other platforms too.
Comment 6 Björn Michaelsen 2012-12-17 16:55:03 UTC
Created attachment 71671 [details]
reverse bibisect to find fix

reverse bibisect run to find fix. since we are searching for a fix instead of a regression, the meanings of good/bad are inverted (bad=is fixed, good=isnt yet fixed). The result suggests a43a76cd5aa2f145f2cb43fcdbc8f21fb6c89af0..9210b95bcfd65ae558f445666d9b880e794d4c7 contains the fix. Thus most likely:

commit eac3e6e746300df379226941ba75c4e0ce1feb7a
Author: Miklos Vajna <vmiklos@suse.cz>
Date:   Wed Nov 14 19:03:05 2012 +0100

    n#789482 DOCX: export track change data after w:hyperlink
    
    Change-Id: If204523d7da544b11b2d809993ada180476104ef

which isnt yet backported to 3.6.

@Miklos: Can we pull that one back?
Comment 7 Björn Michaelsen 2012-12-17 16:55:37 UTC
^^ Miklas: Can you comment?
Comment 8 Not Assigned 2012-12-17 17:18:47 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "libreoffice-3-6":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1a776160ba9abde53bbebce6fdd2cb264a800c2f&g=libreoffice-3-6

fdo#52477 n#789482 DOCX: export track change data after w:hyperlink


It will be available in LibreOffice 3.6.5.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.