Bug 125279 - Custom properties with "_x005F_" keep growing escapements when roundtripping as OOXML formats
Summary: Custom properties with "_x005F_" keep growing escapements when roundtripping ...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
5.4.0.3 release
Hardware: All All
: medium normal
Assignee: Eike Rathke
URL:
Whiteboard: target:6.3.0 target:6.2.5
Keywords: bibisected, bisected, regression
Depends on:
Blocks:
 
Reported: 2019-05-14 11:04 UTC by Aron Budea
Modified: 2019-06-07 09:49 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample ODT (7.71 KB, application/vnd.oasis.opendocument.text)
2019-05-14 11:04 UTC, Aron Budea
Details
Sample ODT (8.57 KB, application/vnd.oasis.opendocument.text)
2019-05-14 13:25 UTC, Aron Budea
Details
How it looks in LibreOffice 6.3 master (28.05 KB, image/png)
2019-06-05 15:18 UTC, Xisco Faulí
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aron Budea 2019-05-14 11:04:49 UTC
Created attachment 151398 [details]
Sample ODT

The attached ODT has a custom property set: "Company" is "Test_x005f_". (without quotes)

- Save ODT as DOCX.
- Reload DOCX.
- Check the property.

=> It becomes "Test_x005F_x005f_".
Upon further reload-save cycles, it keeps growing (Test_x005F_x005F_x005f_, Test_x005F_x005F_x005F_x005F_x005f_, Test_x005F_x005F_x005F_x005F_x005F_x005F_x005F_x005f_ etc.)

Observed using LO 6.3.0.0.alpha0+ (0c0e73584f125025fb17d6be8f8050f3b7649c7d), 5.4.0.3 / Ubuntu 18.04.
No issue with LO 5.3.0.3.
=> regression

Bibisected to the following commit using repo bibisect-linux-64-5.4. Adding Cc: to Eike Rathke, please take a look.
https://cgit.freedesktop.org/libreoffice/core/commit/?id=8b25b67d5268abbb260da968cc23b6f6c8dd31af
author		Eike Rathke <erack@redhat.com>	2017-03-02 17:06:54 +0100
committer	Eike Rathke <erack@redhat.com>	2017-03-03 16:27:21 +0000
Comment 1 Eike Rathke 2019-05-14 12:44:30 UTC
Seems we lack to unescape the content when loading.

With MS-Word:

1. loading the .docx, what is the custom property then?
2. saving again, what is its content in the .docx file?

3. in a new document creating such property and saving to .docx, what is its content in file?
Comment 2 Aron Budea 2019-05-14 13:25:37 UTC
Created attachment 151405 [details]
Sample ODT

Apparently I uploaded an earlier iteration of the test file with an extra "x" at the end of the property value ("Test_x005f_x"). It makes no difference to the behavior, but I'm updating the sample to avoid confusion.

> With MS-Word:
> 
> 1. loading the .docx, what is the custom property then?
> 2. saving again, what is its content in the .docx file?
> 
> 3. in a new document creating such property and saving to .docx, what is its
> content in file?
In Word 2013 the answer to all of this is:
Test_x005f_x005f_

Ie. when the ODT is saved to DOCX, it saves the property as "Test_x005F_x005f_". The only thing Word does is makes the "F" non-capital. Btw, I set this already longer string even for the 3rd test.

But... if I use the original string for 3. ("Test_x005f_"), it becomes "Test_" during save. And if I manually change the XML to have that starting string, it also becomes "Test_" when opened.
Comment 3 Eike Rathke 2019-05-14 14:43:00 UTC
So after saving "Test_x005f_" to .docx in LibO it is "Test_x005F_x005f_" in file, and after
> 1. loading the .docx, what is the custom property then?
viewing the property in the MS-Word UI it is still "Test_x005F_x005f_"?
And then after
> 2. saving again, what is its content in the .docx file?
it is still "Test_x005F_x005f_"?

For 3. you entered "Test_x005f_" in the MS-Word UI and saved to file it becomes "Test_"?

That makes no sense..
Comment 4 Aron Budea 2019-05-14 15:38:58 UTC
(In reply to Eike Rathke from comment #3)
> it is still "Test_x005F_x005f_"?
Technically it's "Test_x005f_x005f_" (the first F isn't capital).

Similarly, if I enter each property in Word, then reload the file, the following happens:
- "Test_x005f_" becomes "Test_"
- "Test_x005F_x005f_" remains "Test_x005f_x005f_"

> That makes no sense..
Are we making assumptions about consistency in another software? :)
Comment 5 Aron Budea 2019-05-14 15:55:13 UTC
Disregarding the "nunances" in what Word does, I think discarding the repeating "x005f_" sequence, and keeping a single "_" in the end (which doesn't even need encoding) would be reasonable.
Comment 6 Commit Notification 2019-05-30 17:23:06 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/f677885fec59f252f36673ee4d8c0b4863625a4d%5E%21

Resolves: tdf#125279 do not double _x005F_ escapement

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Eike Rathke 2019-05-30 17:24:01 UTC
Pending review https://gerrit.libreoffice.org/73219 for 6-2
Comment 8 Xisco Faulí 2019-06-05 15:18:50 UTC
Created attachment 151943 [details]
How it looks in LibreOffice 6.3 master

Maybe I'm missing something but I can still reproduce the problem in

Version: 6.4.0.0.alpha0+
Build ID: 0d6ec494f83fb26524bf3a5fc7af27c225293e87
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

after saving the ODT to DOCX...
Comment 9 Eike Rathke 2019-06-05 15:36:38 UTC
@Xisco:
One "Test_x005f_" has to become "Test_x005F_x005f_"; the fix is that upon the next save it does not become "Test_x005F_x005F_x005f_".
Comment 10 Xisco Faulí 2019-06-05 15:41:59 UTC
(In reply to Eike Rathke from comment #9)
> @Xisco:
> One "Test_x005f_" has to become "Test_x005F_x005f_"; the fix is that upon
> the next save it does not become "Test_x005F_x005F_x005f_".

oh I see, then I wrongly understood the problem, my bad. setting to VERIFIED.
@Eike, thanks for fixing the issue ;-)
Comment 11 Commit Notification 2019-06-07 09:49:08 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-6-2":

https://git.libreoffice.org/core/+/d93ebeb0f785489050dcbe55c5111d639b4b4c1e%5E%21

Resolves: tdf#125279 do not double _x005F_ escapement

It will be available in 6.2.5.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.