Bug Hunting Session
Bug 92731 - Copy-pasting text from document body into comment breaks all following comments when saving to docx. Leads to duplicate attributes of comment, which breaks the parsing of all following comments.
Summary: Copy-pasting text from document body into comment breaks all following commen...
Status: RESOLVED DUPLICATE of bug 113790
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.2.8.2 release
Hardware: Other All
: high major
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Paste-From-MSO DOCX-SAXParse DOCX-Comments
  Show dependency treegraph
 
Reported: 2015-07-14 14:04 UTC by Shane Caldwell
Modified: 2018-01-09 17:10 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
zip archive w 3 files: test file .odt which is fine, contains three comments. test file in .docx format which shows that second format corrupts comments following. xml comment file extracted from docx (4.71 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2015-07-14 14:04 UTC, Shane Caldwell
Details
File containing test comments with different formatting types, appears to be handled well. (6.17 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2015-08-05 19:59 UTC, Shane Caldwell
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Shane Caldwell 2015-07-14 14:04:40 UTC
Created attachment 117231 [details]
zip archive w 3 files: test file .odt which is fine, contains three comments. test file in .docx format which shows that second format corrupts comments following. xml comment file extracted from docx
Comment 1 Buovjaga 2015-07-29 17:45:07 UTC
attachment 117231 [details] is actually only comments.docx.
Before attaching the .zip, could you try with an up-to-date version (4.4 or 5.0 RC4)?

If using an older Ubuntu, you can get the latest stable from a PPA maintained by the Ubuntu team:

sudo add-apt-repository -y ppa:libreoffice/ppa

sudo apt-get update

sudo apt-get dist-upgrade 

Set to NEEDINFO.
Change back to UNCONFIRMED, if the problem persists. Change to RESOLVED WORKSFORME, if the problem went away.
Comment 2 Shane Caldwell 2015-08-05 19:59:13 UTC
Created attachment 117685 [details]
File containing test comments with different formatting types, appears to be handled well.
Comment 3 Shane Caldwell 2015-08-05 20:11:37 UTC
Sorry about the zip, my bad. The docx was the important one anyway. 

I've updated to 5.0.0.5, and it does seem to fix the comment creation problem.

However, the file generated using 4.2.8.2 that led me to make this report can't be opened with libreoffice 5.0.0.5. 

4.2.8.2 would open the file, but corrupt all of the comments following the problematic one. Now, upon opening, I instead get an error message:


LibreOffice 5.0.0.5
 
File format error found at 
SAXParseException: '[word/comments.xml line 2]: Attribute w:hAnsi redefined
', Stream 'word/comments.xml', Line 2, Column 1215
SAXParseException: '[word/document.xml line 2]: unknown error', Stream 'word/document.xml', Line 2, Column 20452(row,col).


I can provide the file (and/or extract the xml comments) but they contain sensitive information so I would need to know they'd be held in confidence.
Comment 4 Simon Dedman 2015-09-21 12:14:35 UTC
Similar story for me. Nightmarish experience trying to move images & comments, definitely pasted text from comment to body & the opposite. Also pasted text from image caption to another image caption. Ended up having to strip out all XML with the command line and get my work out in textfile.
Comment 5 Danilo 2016-01-17 08:45:11 UTC
Hello,

this is also observed here in Libreoffice 5.0.4.2, in Windows.

After some cut and paste, the w:cstheme tag is duplicated in the rFont tag. I can find the duplicate in this case in VI using the search pattern:

rFonts[a-zA-Z:" =]*cstheme[a-zA-Z:" =]*cstheme

I imagine something similar can be used to identify breaks caused by other duplicate tags.

Nevertheless, this doesn't explain why the problem happened and how it can be fixed.

Cheers,
Danilo
Comment 6 Buovjaga 2016-01-18 12:32:32 UTC
Ok, setting to NEW.
Comment 7 Ertxiem 2016-02-13 20:21:29 UTC
I have the same problem with version 4.2.8.2.
Copying text that has a link to a comment into a comment creates an ill formatted comment, at least in docx documents.

I can manually correct the file corruption (see below), but of course it would be better if it did not occur.

To reproduce the problem:
1. Create a new (empty) document.
2. Write some text in it.
3. In a part of the text add a comment (I selected some of the text and used the shortcut ctrl+alt+c).
4. Type some text in the comment.
5. Type more text in the (main) document (otherwise when you save the comment may not be updated, but that's a different bug).
6. Save the document in a docx format.
7. Close LibreOffice and open the document again (everything should be as it was).
8. Copy some of the text including all the text that is linked to the comment (I used ctrl+c).
9. Paste it in the comment (I used ctrl+v).
10. Write some more text in the main document (again to be sure that the comment will be updated when you save it).
11. Save and close the document.
12. Open it again.
The comment should loose some information that was there before, namely the author and date. If you open the comments.xml file inside the docx document you'll see that there are repeated tags inside the last comment which LibreOffice can't process correctly.

To solve it:
In Linux, the way I've been able to workaround and correct corrupted docx files. I make a copy of the docx file (make a backup!) and open the file with Archive Manager, extract comments.xml, open it with a text editor, delete the duplicate tags and replace the corrected version of comments.xml inside the original docx file.


By the way, the comments.xml is horribly formatted: everything in one line. I suggest that LibreOffice adds some newlines after each tag. Bonus points if indentation of the xml is properly done.
Comment 8 Shane Caldwell 2016-02-15 17:04:19 UTC
(In reply to Ertxiem from comment #7)
> 
> To solve it:
> In Linux, the way I've been able to workaround and correct corrupted docx
> files. I make a copy of the docx file (make a backup!) and open the file
> with Archive Manager, extract comments.xml, open it with a text editor,
> delete the duplicate tags and replace the corrected version of comments.xml
> inside the original docx file.
> 
> 
> By the way, the comments.xml is horribly formatted: everything in one line.
> I suggest that LibreOffice adds some newlines after each tag. Bonus points
> if indentation of the xml is properly done.

This method also allowed me to fix the corrupted file. I also second all of these points about the xml formatting, it's a mess to wade through and find the errors.
Comment 9 Julien Nabet 2016-10-27 20:49:41 UTC
With last stable LO version 5.2.2, could someone give step by step process to produce a corrupted file from a clean file?

Indeed, taking a look to comments, quoted versions are EOL and it seems corrupted files have been fixed.
Comment 10 Aron Budea 2018-01-09 17:10:59 UTC
Let's assume it's a duplicate of bug 113790.

*** This bug has been marked as a duplicate of bug 113790 ***