Bug 125298 - FILESAVE DOCX Bookmark names and field references shortened in case they are 40 characters long and contain non ASCII characters
Summary: FILESAVE DOCX Bookmark names and field references shortened in case they are ...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.3.0.0.alpha0+
Hardware: All All
: low normal
Assignee: Not Assigned
URL:
Whiteboard: target:6.3.0 target:6.4.0
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-15 09:02 UTC by NISZ LibreOffice Team
Modified: 2019-08-21 09:51 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Example file with a truncated bookmark name. (18.69 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-05-15 09:03 UTC, NISZ LibreOffice Team
Details
Screenshot of a truncated bookmark name before save and after save and reload (194.39 KB, image/png)
2019-05-15 09:03 UTC, NISZ LibreOffice Team
Details
The same as example1, but with a different value. (221.06 KB, image/png)
2019-05-15 09:03 UTC, NISZ LibreOffice Team
Details
Screenshot about the unzipped document.xml file with the truncated bookmark name and field reference. (198.04 KB, image/png)
2019-05-15 09:04 UTC, NISZ LibreOffice Team
Details

Note You need to log in before you can comment on or make changes to this bug.
Description NISZ LibreOffice Team 2019-05-15 09:02:27 UTC
Description:
In the OOXML standard, there is a limitation for the bookmark names and for field references (the value of <w:instrText> tags) to maximum 40 characters. 
There is an encode/decode mechanism in LibreOffice for non ASCII characters in bookmark names and in field references, which mechanism creates more characters from non-ascii characters. For example %C5%91 from ő. 
If the truncation happens before the decoding, non ASCII characters will be counted as more than one characters, which means bookmark names or field references can be truncated if they contain non ASCII characters.

Steps to Reproduce:
    1. Create some text
    2. Select some section of the text
    3. click on insert menu, select bookmark
    4. give it a name which contains non-ASCII characters and long enough (for example árvíztűrő tükörfúrógép, or 1é2á3ű4ő5ú6ö7ü8ó9í)
    5. go to somewhere else in the document, for example to the end of document, create a new paragraph 
    6. click on insert menu, select cross-reference
    7. select the value "Bookmark" in "Type" listbox, then select the value "Reference" in the "Insert reference to..." listbox
    8. in the "Selection" listbox, double click on the previously named bookmark
    9. save the file as docx and reload it
    10. rename the file to .zip instead of .docx, unzip it, and check out document.xml in word folder
    11. look at these tags:
<w:bookmarkStart w:name="something" w:id="0"/>
<w:instrText> REF something \h </w:instrText>

Actual Results:
Some bookmarks which are not longer than 40 characters will be truncated if they contain non ASCII characters.
For example: 1é2á3ű4ő5ú6ö7ü8ó9í as a bookmark name will be truncated to 1é2á3ű4ő5ú6%C3%
and árvíztűrő tükörfúrógép as a bookmark name will be truncated to árvíztűrő_tük%C 

The cross references are still working despite the truncation, this is only a cosmetic problem.

Expected Results:
In MS Word if a bookmark name contains non-ascii characters and its size is below 41 characters it wont be truncated. We should emulate this behavior.


Reproducible: Always


User Profile Reset: No



Additional Info:
See also: 113483
Comment 1 NISZ LibreOffice Team 2019-05-15 09:03:02 UTC
Created attachment 151419 [details]
Example file with a truncated bookmark name.
Comment 2 NISZ LibreOffice Team 2019-05-15 09:03:24 UTC
Created attachment 151420 [details]
Screenshot of a truncated bookmark name before save and after save and reload
Comment 3 NISZ LibreOffice Team 2019-05-15 09:03:48 UTC
Created attachment 151421 [details]
The same as example1, but with a different value.
Comment 4 NISZ LibreOffice Team 2019-05-15 09:04:06 UTC
Created attachment 151422 [details]
Screenshot about the unzipped document.xml file with the truncated bookmark name and field reference.
Comment 5 Timur 2019-05-16 13:40:56 UTC
I confirm it's truncated. In my case kghčdžšđćškghčdžšđćškghčdžšđćš to kghčdžšđćš.
Comment 6 Commit Notification 2019-05-17 22:08:55 UTC
Adam Kovacs committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/1cbf0ee54519bf81d934609352e8a1a641d8a534%5E%21

tdf#125298 DOCX export: fix bookmark name truncation

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Commit Notification 2019-08-21 09:51:26 UTC
Tünde Tóth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/d137a6944e42f5a59d6c318999edbf97d05cb9fd%5E%21

clean up "tdf#125298 DOCX export: fix bookmark name truncation"

It will be available in 6.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.