Bug 113483 - FILESAVE DOCX Cross reference corrupted if the target is a non-ASCII bookmark (steps comment 9)
Summary: FILESAVE DOCX Cross reference corrupted if the target is a non-ASCII bookmark...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.3.0 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:6.3.0
Keywords: filter:docx
Depends on:
Blocks: Fields-Cross-Reference DOCX-Character DOCX-Fields
  Show dependency treegraph
 
Reported: 2017-10-27 13:17 UTC by Gabor Kelemen (allotropia)
Modified: 2019-05-28 13:53 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample file (11.17 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-10-27 13:17 UTC, Gabor Kelemen (allotropia)
Details
Screenshot of the document in LO 5.4 (55.01 KB, image/png)
2017-10-27 13:18 UTC, Gabor Kelemen (allotropia)
Details
Example file in odt (16.99 KB, application/vnd.oasis.opendocument.text)
2017-10-27 21:40 UTC, Gabor Kelemen (allotropia)
Details
Example file saved as docx (4.90 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-10-27 21:41 UTC, Gabor Kelemen (allotropia)
Details
Example file saved as docx then resaved again (4.99 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-10-27 21:45 UTC, Gabor Kelemen (allotropia)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gabor Kelemen (allotropia) 2017-10-27 13:17:15 UTC
Created attachment 137322 [details]
Sample file

Attached file contains a bookmark named "Első" and a cross reference to it.

After saving to DOCX format and reloading the cross reference becomes "Els%C5%91" while the bookmark still says "Első" and the reference breaks.

After hitting space in the paragraph with the broken reference the referenced text is replaced with "Error: Reference source not found" in the document.
Comment 1 Gabor Kelemen (allotropia) 2017-10-27 13:18:17 UTC
Created attachment 137323 [details]
Screenshot of the document in LO 5.4
Comment 2 Xisco Faulí 2017-10-27 14:34:00 UTC
I can't reproduce it in

Version: 6.0.0.0.alpha1+
Build ID: 0c46b3a9a384d5b70a708c3e9459a790dd815c63
CPU threads: 1; OS: Windows 6.1; UI render: default; 
Locale: fr-BE (es_ES); Calc: group

Could you please try to reproduce it with a master build from http://dev-builds.libreoffice.org/daily/master/ ?
You can install it alongside the standard version.
I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' if the bug is still present in the master build
Comment 3 Gabor Kelemen (allotropia) 2017-10-27 21:40:37 UTC
Created attachment 137329 [details]
Example file in odt
Comment 4 Gabor Kelemen (allotropia) 2017-10-27 21:41:04 UTC
Created attachment 137330 [details]
Example file saved as docx
Comment 5 Gabor Kelemen (allotropia) 2017-10-27 21:45:49 UTC
Created attachment 137331 [details]
Example file saved as docx then resaved again

In 6.0 alpha it is not broken the same way. 
Now the bookmark name changes as well: while it becomes an unreadable "Els%C5%91" it is consistent with the reference name, so they work.
After one more resave to docx the names become: "Els%25C5%2591" - even harder to read, but still consistent.

Version: 6.0.0.0.alpha1+
Build ID: 93947341acb91c7ad508d1de72f5705f730d8e93
CPU threads: 4; OS: Linux 4.4; UI render: default; VCL: gtk2; 
Locale: en-US (hu_HU.UTF-8); Calc: group
Comment 6 Dieter 2017-10-28 09:25:18 UTC
I could reproduce it:

1. Open attachment from comment 3
3. Save as docx
4. Open "fields" in the context menu of the cross-reference => Name of the bookmark is Első
5. Save as docx
6. Close and reopen
7. Open "fields" in the context menu of the cross-reference => Name of the bookmark is Els%C5%91

I couldn' reproduce the folloing behaviour: "After hitting space in the paragraph with the broken reference the referenced text is replaced with "Error: Reference source not found" in the document."

Version: 6.0.0.0.alpha1 (x64)
Build ID: c1d1f859b268f650143d48f294999cda0fa57350
CPU threads: 4; OS: Windows 10.0; UI render: default; 
Locale: de-DE (de_DE); Calc: group
Comment 7 QA Administrators 2018-10-29 03:57:49 UTC Comment hidden (obsolete)
Comment 8 Gabor Kelemen (allotropia) 2018-10-29 07:45:04 UTC
Still happens with 
Version: 6.2.0.0.alpha1+
Build ID: b6b31bbb1a9e2272ac77de127825c4ee9f71effa
CPU threads: 4; OS: Windows 6.3; UI render: GL; VCL: win; 
Locale: hu-HU (hu_HU); Calc: CL
Comment 9 Adam Kovacs 2019-03-18 11:10:35 UTC
How to reproduce (the example document with the bug) in Writer (6.3):
1. create some text (for example write "lorem" and hit f3)
2. select some section of the text (for example the first sentence)
3. click on insert menu, select bookmark
4. give it a name which contains non-ASCII characters (for example Első)
5. go to somewhere else in the document, for example to the end in a new line
6. click on insert menu, select cross-reference
7. select the value "Bookmark" in "Type" listbox, then select the value "Reference" in the "Insert reference to..." listbox
8. in the "Selection" listbox, double click on the previously named bookmark
9. save the file as docx and reload it

The bug is because the bookmark name will be converted to Els%25C5%2591, but the cross-reference still point at Első
Comment 10 Adam Kovacs 2019-03-18 12:27:22 UTC Comment hidden (obsolete)
Comment 11 Adam Kovacs 2019-03-18 12:28:33 UTC
(In reply to Adam Kovacs from comment #10)
> Is it possible that somehow this is solved by the version 6.1.0.3? I can
> make an export to docx from this odt example file with the working
> cross-reference.

Comment fix: until the version 6.1.0.3.
Comment 12 Adam Kovacs 2019-03-18 12:35:04 UTC
So the reference is not broken in the version 6.1.0.3, but of course, the non ASCII characters will be converted. For example ő to %C5%91.
Comment 13 Adam Kovacs 2019-05-02 09:26:08 UTC
bookmarkName and sToken need to be decoded
https://opengrok.libreoffice.org/xref/core/sw/source/filter/ww8/docxattributeoutput.cxx?r=1fe24bb1#1614
https://opengrok.libreoffice.org/xref/core/sw/source/filter/ww8/docxattributeoutput.cxx?r=1fe24bb1#1985

sToken = INetURLObject::decode(sToken, INetURLObject::DecodeMechanism::Unambiguous, RTL_TEXTENCODING_UTF8);
Comment 14 Adam Kovacs 2019-05-02 10:42:10 UTC
In document.xml, the these are the related xml tags:
<w:bookmarkStart w:name="Els%C5%91" w:id="0"/>
<w:instrText> REF Els%C5%91 \h </w:instrText>
Comment 15 Commit Notification 2019-05-06 08:30:51 UTC
Adam Kovacs committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/b9afb9959c31c3c57d0f2fe91107a92abfd82cdb%5E%21

tdf#113483: DOCX: fix encoding of bookmarks with non-ASCII letters

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Timur 2019-05-16 13:26:29 UTC
Verified for DOCX. Backport to 6.2 wouldn't hurt, I guess. 
DOC is still wrong. 
Adam, could you handle DOC here or we should open a new one?
Comment 17 László Németh 2019-05-28 06:19:08 UTC
Timur: it's better a new bug report for the obsolete DOC, if it's really needed. We'll check the back-port, too. Thanks for the verification!