Download it now!
Bug 85769 - Writer saves malformed comments.xml, breaking comments in .docx output
Summary: Writer saves malformed comments.xml, breaking comments in .docx output
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.2.4.2 release
Hardware: Other All
: highest major
Assignee: Not Assigned
URL:
Whiteboard: target:5.0.0 target:4.4.4
Keywords: bibisected, bisected, regression
: 90569 (view as bug list)
Depends on:
Blocks: mab4.3
  Show dependency treegraph
 
Reported: 2014-11-02 18:48 UTC by Brad Smith
Modified: 2015-12-17 08:38 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
test case (4.79 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2015-03-23 18:20 UTC, Jozef Vesely
Details
repaired test case (4.68 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2015-03-23 18:27 UTC, Jozef Vesely
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Brad Smith 2014-11-02 18:48:39 UTC
When using LibreOffice to edit a .docx document and add comments, occasionally some (but not all) of the comment text will just disappear when the document is closed and re-opened. The comments themselves don't disappear, they just become empty boxes with no text. 

I unzipped a .docx that was having this problem and found the following:

```
$ xmllint word/comments.xml
word/comments.xml:2: parser error : Attribute w:ascii redefined
i="" w:hAnsi="" w:eastAsia="SimSun" w:cs="Times New Roman" w:ascii="" w:hAnsi=""
                                                                               ^
word/comments.xml:2: parser error : Attribute w:hAnsi redefined
i="" w:hAnsi="" w:eastAsia="SimSun" w:cs="Times New Roman" w:ascii="" w:hAnsi=""
                                                                               ^
```

Opening up comments.xml, indeed there were two <w:comment> blocks that had the w:hAnsi and w:ascii attributes defined twice, first to the intended value, and then to an empty string. 

I removed the latter by hand, re-zipped the file, and now the comments appear, but I have no idea what caused the problem in the first place. 

Unfortunately the document in question was a student's paper from a college course, so I can't provide it as an example.
Comment 1 Buovjaga 2014-11-15 16:36:59 UTC
(In reply to Brad Smith from comment #0)
> Unfortunately the document in question was a student's paper from a college
> course, so I can't provide it as an example.

Could you try with this: https://wiki.documentfoundation.org/QA/BugReport/Attachments#Confidential_Attachments
Comment 2 Matt Price 2014-12-10 02:04:02 UTC
wondering if this is the same as https://bugs.freedesktop.org/show_bug.cgi?id=73221
Comment 3 Matt Price 2014-12-10 15:02:06 UTC
wondering if this is the same as https://bugs.freedesktop.org/show_bug.cgi?id=73221
Comment 4 Robinson Tryon (qubit) 2014-12-27 04:31:39 UTC
(In reply to Brad Smith from comment #0)
> When using LibreOffice to edit a .docx document and add comments,
> occasionally some (but not all) of the comment text will just disappear when
> the document is closed and re-opened.
> ...
> Unfortunately the document in question was a student's paper from a college
> course, so I can't provide it as an example.

As beluga mentions, we really need a sample document to be able to identify and reproduce this particular bug.

(In reply to Matt Price from comment #2)
> wondering if this is the same as
> https://bugs.freedesktop.org/show_bug.cgi?id=73221

This could be related to bug 73221, but one of the devs would be more able to make that determination. Right now we can either
1) Resolve this bug as a duplicate of 73221, or
2) Try to reproduce the problem (for which we'd need a sanitized example file from Brad)

Status -> NEEDINFO
Comment 5 Jozef Vesely 2015-03-23 18:20:26 UTC
Created attachment 114282 [details]
test case
Comment 6 Jozef Vesely 2015-03-23 18:27:36 UTC
Created attachment 114283 [details]
repaired test case

Damaged file had superflous w:ascii="" w:hAnsi="" attributes in comments.xml.

Steps to reproduce:
1. create new document
2. paste Lorem impsum form firefox
3. select a word
4. copy it (Ctrl-C)
5. insert comment
6. paste copied word into comment
7. save as *.docx

Version: 4.2.7.2
Build ID: 420m0(Build:2)

I had another document in which xml:space="preseve" attribute was present that broke the comments that followed. This probably resulted from copy-pasting formated/highlited text into the comment.
Comment 7 Jozef Vesely 2015-03-23 19:04:56 UTC
Nevermind the xml:space="preserve" stuff I mentioned before.
Simple s/w:ascii="" w:hAnsi=""// "repairs" all my documents.
Comment 8 raal 2015-03-23 22:02:28 UTC
I can confirm with Version: 4.5.0.0.alpha0+
Build ID: c3087d969671e62182eb049850479e77190ccff4
TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2015-03-22_04:37:00

After reopen, the text after the comment is lost, dataloss.


works with  4.3.3.2, ID build: 430m0(Build:2), linux -> regression
Comment 9 Matthew Francis 2015-03-28 07:56:29 UTC
Bibisect results from 43all:

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
7476fec6f7f3a17fd1e72ced6686f8e99e998b3d
047d10291c5cd0615c992821829c9153e2f06b13

Not immediately obvious what it could be in that range. Several Word filter related commits are possibilities
Comment 10 Robinson Tryon (qubit) 2015-03-31 16:43:02 UTC
(In reply to raal from comment #8)
> I can confirm with Version: 4.5.0.0.alpha0+
> Build ID: c3087d969671e62182eb049850479e77190ccff4
> TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time:
> 2015-03-22_04:37:00
> 
> After reopen, the text after the comment is lost, dataloss.

Raal: you marked this bug as priority 'highest' -- please also add it to a MAB, or just mark the priority as 'high' :-)
Comment 11 Julien Nabet 2015-04-11 20:02:40 UTC
Brad: about what could cause the problem, see tdf#90569 which is in See Also
Comment 12 Matthew Francis 2015-04-12 04:47:14 UTC
This seems to have begun at the below commit.
Adding Cc: to matus@libreoffice.org; Could you possibly take a look at this one? Thanks

commit 9ae701509add0f0192b02fab787c6acbc64be349
Author: Matúš Kukan <matus.kukan@gmail.com>
Date:   Tue Oct 15 09:29:27 2013 +0200

    FastAttributeList: use vectors instead of map; the size is small
    
    This is also preparation to avoid OString internal usage.
    
    Change-Id: If0ea36155d8ab3f5c91c2aafd6932fabeadadd41
Comment 13 Matúš Kukan 2015-04-17 18:57:22 UTC
Ah, that's unfortunate, there will be more similar problems with this attribute list change.
Anyway, I did https://gerrit.libreoffice.org/#/c/15367/ but waiting for some comments there.

Bug 90569 is really the same problem as this bug, so no point in having both I guess.
Comment 14 Commit Notification 2015-04-22 17:04:50 UTC
Matúš Kukan committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=89964955e535f7343cccf1399312f0e8ac76323d

tdf#85769 Avoid writing font name attribute twice, by ignoring empty value

It will be available in 5.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 15 Jorendc 2015-04-22 17:31:39 UTC
*** Bug 90569 has been marked as a duplicate of this bug. ***
Comment 16 Commit Notification 2015-04-30 08:37:33 UTC
Caolán McNamara committed a patch related to this issue.
It has been pushed to "libreoffice-4-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=4a4dadc12777db78de60f64773f4737dd604419a&h=libreoffice-4-4

Resolves: tdf#85769 fix duplicate attribute export to docx...

It will be available in 4.4.4.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 17 Robinson Tryon (qubit) 2015-12-17 08:38:26 UTC
Migrating Whiteboard tags to Keywords: (bibisected)
[NinjaEdit]