Bug 147810 - FILSESAVE DOCX: File corruption in a document with hyperlink and shape
Summary: FILSESAVE DOCX: File corruption in a document with hyperlink and shape
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.4.0.3 release
Hardware: All All
: medium normal
Assignee: Tünde Tóth
URL:
Whiteboard: target:7.6.0
Keywords: bibisected, bisected, dataLoss, regression
Depends on:
Blocks: DOCX-SAXParse DOCX-Textbox DOCX-Hyperlink
  Show dependency treegraph
 
Reported: 2022-03-06 17:51 UTC by Tal Tamir
Modified: 2023-04-19 06:19 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
test case proving corruption (26.86 KB, application/vnd.oasis.opendocument.text)
2022-03-06 17:54 UTC, Tal Tamir
Details
simplified document (16.46 KB, application/vnd.oasis.opendocument.text)
2022-03-24 06:23 UTC, Dieter
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tal Tamir 2022-03-06 17:51:15 UTC
Description:
I have attached a file "corruption.odt" that causes reproducible file corruption when saved as a .docx file.
I have tested it in:
7.2.5.2
7.3.0.3
7.3.1.3
windows 10
windows 11
2 different computers

step 1: open corruption.odt
step 2: save as .docx

the docx will come out as corrupt. I have deleted as much I could from the file without breaking the repeatable corruption event. However at some point if I trim more or move things around it will unbreak it and make for a file that can be saved into docx without getting corrupted.

Steps to Reproduce:
1. open corruption.odt
2. save as .docx

Actual Results:
the .docx file is corrupt. trying "open anyways" will delete all pages after the 1st one. so in a 50 page document 49 pages were lost. I trimmed it down to just 2 pages for this demonstration file

Expected Results:
a working .docx that is not corrupted


Reproducible: Always


User Profile Reset: Yes



Additional Info:
The bugged out file is here
https://mega.nz/file/3QtlQapI#3MeecIzYkgIF1wX6qCPKHbHWM86FVGnWQFOFEpQ4Xn4
Comment 1 Tal Tamir 2022-03-06 17:54:27 UTC
Created attachment 178683 [details]
test case proving corruption

attached is the trimmed down file that is confirmed to cause the corruption.

Also I should note that the output docx has been tested in google docs, microsoft word, and libreoffice and all 3 say that the output .docx file is corrupted.

which incidentally shows a concerning problem. When libreoffice performs a save operation it fails to check that the resultant created file is valid. which should have happened on the spot.
Comment 2 Tal Tamir 2022-03-06 17:56:44 UTC Comment hidden (obsolete)
Comment 3 Dieter 2022-03-21 07:15:20 UTC Comment hidden (obsolete)
Comment 4 Tal Tamir 2022-03-21 16:55:22 UTC
@Dieter
I have tested it with the latest version before filing the bug report, as I stated in the description of the bug report I have tested it on:
7.2.5.2
7.3.0.3
7.3.1.3
windows 10
windows 11
2 different computers

I just re-downloaded the latest version v7.3.1 from the linked location and the file I downloaded has identical hash to the v7.3.1 I had previously downloaded and tested it with
Comment 5 Dieter 2022-03-24 06:23:00 UTC
Created attachment 179063 [details]
simplified document

I confirm it with

Version: 7.3.1.3 (x64) / LibreOffice Community
Build ID: a69ca51ded25f3eefd52d7bf9a5fad8c90b87951
CPU threads: 4; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-GB
Calc: CL

Steps:
1. Open attached file
2. Save as docx
3. Reload

Actual result
Warning message. If you continue, document opens but everything below hyperlink is lost.
Comment 6 Gabor Kelemen (allotropia) 2022-03-24 09:25:32 UTC
Seems to have started back in 4.4
Export was correct in 4.3

Bibisected with windows-4.4 with the simplified document to the range: 
https://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=67be577f163831e460e19aee958bdcf7187b8a56..22ca3bf1b9dedec65a57ac70a124bf69cd242e0e

of which, this one seems to be related:

https://cgit.freedesktop.org/libreoffice/core/commit/?id=9835a5823e0f559aabbc0e15ea126c82229c4bc7

author	Miklos Vajna <vmiklos@collabora.co.uk>	2014-10-04 19:37:55 +0200
committer	Miklos Vajna <vmiklos@collabora.co.uk>	2014-10-04 20:18:01 +0200

sw textboxes: reimplement ODF import/export

Also removing the arrow shape from the original document before saving solves the docx corruption issue: changing meta bug.
Comment 7 Gabor Kelemen (allotropia) 2022-03-24 09:26:17 UTC Comment hidden (obsolete)
Comment 8 Dieter 2022-03-24 09:39:30 UTC
(In reply to Gabor Kelemen (allotropia) from comment #6)
> Also removing the arrow shape from the original document before saving
> solves the docx corruption issue: changing meta bug.

Yes, but removing the hyperlink before saving solves the corruption issue too. So it's the combination of hyperlink and anchor of the shape, that causes the problem. If you move anchor to second paragraph, it works.
Comment 9 Miklos Vajna 2022-03-24 11:08:54 UTC
Given the bugdoc uses the ODF markup implemented in the above commit, if you use an older LO version, you won't be able to import the problematic content from the bugdoc. So the problem was there before the commit as well, just more hidden. I bet you could construct a DOCX input that builds the same document model, and then reproduce this bug with an older version as well.

But still, thanks for the bisect, at least we know there is no obvious info we miss from that source.
Comment 10 Gabor Kelemen (allotropia) 2023-03-10 15:21:53 UTC
Still a problem in a fresh master from today, after recent similar fixes:

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 1b8a9af14d652115e1b17ecca79b647e94a63ef5
CPU threads: 14; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: hu-HU (hu_HU); UI: en-US
Calc: threaded

@Tünde maybe one more for you?
Comment 11 Commit Notification 2023-04-18 09:05:03 UTC
Tünde Tóth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/23cb5a95e057060a47facad19ad150134aa0692b

tdf#147810 DOCX export: fix corrupt file with hyperlink and text box

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 NISZ LibreOffice Team 2023-04-19 06:19:06 UTC
VERIFIED IN:
Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 61b41646c5a93ca24f2c9f143cdb0da2c9258989
CPU threads: 8; OS: Windows 10.0 Build 19044; UI render: Skia/Vulkan; VCL: win
Locale: hu-HU (hu_HU); UI: hu-HU
Calc: CL threaded