Created attachment 178770 [details] diff -u output on document.xml from correct to corrupt DOCX Two days ago I saved a document containing many tracked changes, including a new table. When I tried to reopen it, LibreOffice complained that it was corrupt, and was unable to fix it: > An error occurred during opening the file. This may be caused by incorrect file contents. The error details are: > SAXException: [word/document.xml line 2]: Opening and ending tag mismatch: rPr line 2 and ins > /buildir/build/BUILD/libreoffice-7.2.5.2/sax/source/fastparser/fastparser.cxx:619 > Proceeding with import may cause data loss or corruption, and application may become unstable or crash. > Do you want to ignore the error and attempt to continue loading the file? If I choose "Yes" then the file opens, but contents are truncated in the middle the table I added (with tracked changes) just before corruption happened. Of course I don't know how to reproduce the bug, but I had a backup of the previous version of the file, and the diff between the XML code of that document (which is not corrupt) and the new corrupt one is very limited (apart from trivial one-line changes), so maybe it can be useful? I'm attaching it. I can send you both the correct and the corrupt document privately if you're interested. Version: 7.2.5.2.0+ Build ID: 20(Build:2) CPU threads: 8; OS: Linux 5.16; UI render: default; VCL: gtk3 Locale: fr-FR (fr_FR.UTF-8); UI: fr-FR Calc: threaded
Hi Milan I'm interested in this one. Please send me the two files in private, I'll figure out what operation went wrong.
In addition to sending to a volunteer, you can also try to replace all chars with x in the backup file and then try to reproduce a problem with changing it.
Thanks Gabor, I've just sent you the files. I'll also try reproducing the bug, but since that involved creating a table and making various modifications to it in a relatively irregular order, I don't have high hopes to suceed.
[Automated Action] NeedInfo-To-Unconfirmed
This happened to me again. I can send you the new corrupt file if you're interested. Fixing it manually turned out to be relatively easy: a </w:ins> markup was placed before rather than after the <w:ins ...> block it was supposed to close.
I wonder if Gabor managed to find any specific steps after the previous sample, ultimately being able to reproduce the bug would be the most helpful. You could send me the new file, and any recollection of what you did, in case Gabor is busy with other stuff.
I can send you the file, but there's no chance for me to remember what I did. The bug only appears when doing many modifications with lots of tracked changes, and I only realize it's there after closing and reloading the file -- i.e. after hours of work on the document.
Ah, now I have an ODT document which is reliably corrupt simply by saving it to DOCX. I'm sending it to you privately.
I was able to reproduce _a_ bug using the privately sent sample. It's a regression that could be bibisected to the 7.3 backport of the following commit using repo bibisect-linux-64-7.3. Adding CC: to Miklos Vajna. It may not be the originally reported problem, as that was reported for 7.2.5.2.0+, and this change didn't get into 7.2. The description also mentions a table, and the sample doesn't have one (though the table might've been unrelated to the corruption). But it's definitely a bug. https://cgit.freedesktop.org/libreoffice/core/commit/?id=71019ec15bd3fe15385443b68614fd2402e0040f author Miklos Vajna <vmiklos@collabora.com> 2022-05-06 15:21:45 +0200 committer Miklos Vajna <vmiklos@collabora.com> 2022-05-06 16:38:53 +0200 sw: don't copy useless char escapement to next node on split
Some ideas: - The output in document.xml looks not well-formed. This must be because we screw up the start/end calls in sw/source/filter/ww8/docxattributeoutput.cxx, but hard to guess, given that also marking / merging is involved as a fast serializer level. It feels like this is the root cause. - You can hide the previously hidden problem again but limiting the effect of the above commit. It was meant to modify how editing works, so perhaps adding a condition to not do anything in the pDoc->IsInReading() case (where pDoc is an SwDoc*) would work around the immediate problem.
FWIW the bug happens really often now. I had to fix three documents manually in the last month. I'm surprised it has not been reported by others. I'm going to go back to LO 7.2 for now.
Created attachment 184934 [details] Minimal reproducer Saving this document as DOCX produces not-well-formed XML. But if I revert the above commit, the problem doesn't go away for me.
I've bisected the problem that is reproducible with the attachment, the first bad commit is: eeee19b3fcf8b0374c361c7f6c285fd5c89b5a2f (tdf#142387 DOCX track changes: export w:del in w:ins, 2021-06-11) Adding Cc: to László Németh László: please take a look. Thanks!
I now realize that previously these nested changes were simply lost, adjusting keywords accordingly.
Description of the proposed fix in https://gerrit.libreoffice.org/c/core/+/146782: tdf#147892 DOCX: fix broken export at para marker change tracking history OOXML export of tracked deletion of tracked paragraph insertion resulted invalid document.xml, because change tracking history of paragraph markers isn't supported by OOXML. Export the tracked deletion of the last run of paragraphs without history (tracked insertion). This way w:p/w:pPr/w:rPr contains only w:del or w:ins, not both of them (with broken tag order). Note: it's possible to optimize the fix to keep the change tracking history of the characters of the last run of the paragraph, except the paragraph marker. Regression from commit eeee19b3fcf8b0374c361c7f6c285fd5c89b5a2f "tdf#142387 DOCX track changes: export w:del in w:ins". Minimal unit test document was created by Miklós Vajna.
László Németh committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/382892341a63e400f221e1a533fd5a5d6b4d9d70 tdf#147892 DOCX: fix corrupt export at para marker revision history It will be available in 7.6.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Fixed in master, backport is in progress. @Milan: thanks for the bug report! @Miklós: thanks for bibisecting & analysis! @Gábor, Áron, Timur: thanks for handling the issue!
Thanks!
László Németh committed a patch related to this issue. It has been pushed to "libreoffice-7-5": https://git.libreoffice.org/core/commit/4eb34d281260d7c9b6f2268ca1fa439655238a55 tdf#147892 DOCX: fix corrupt export at para marker revision history It will be available in 7.5.2. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
László, could the fix be backported to 7.4 as well?
*** Bug 154070 has been marked as a duplicate of this bug. ***
Следующие пакеты имеют неудовлетворенные зависимости: libobasis7.5-core: Требует: libreoffice7.5-ure (<= 7.5.1.2-2) Требует: libobasis7.5-ooofonts (<= 7.5.1.2-2) libreoffice7.5: Требует: libreoffice7.5-ure (<= 7.5.1.2-2) libreoffice7.5-dict-en: Требует: libreoffice7.5-ure (<= 7.5.1.2-2) libreoffice7.5-dict-es: Требует: libreoffice7.5-ure (<= 7.5.1.2-2) libreoffice7.5-dict-fr: Требует: libreoffice7.5-ure (<= 7.5.1.2-2) libreoffice7.5-dict-ru: Требует: libreoffice7.5-ure (<= 7.5.1.2-2) I wanted to test 7.5.1.2 on Linux, but he does not want to. Хотел протестить 7.5.1.2 на линукс, но он не хочет.
Version: 7.5.2.1 (X86_64) / LibreOffice Community Build ID: e8bf3b441b8370f8440b0339fd9490765a8d57ca CPU threads: 4; OS: Linux 5.15; UI render: default; VCL: x11 Locale: ru-RU (ru_RU.UTF-8); UI: ru-RU Calc: threaded 7.5.2.1 No problem. But did you fix it or did you disable the check? Design from 7.4 is not complete 7.5 7.5.2.1 Проблемы нет. Но вы починили или проверку отключили? Дизайн от 7.4 не полный 7.5
László Németh committed a patch related to this issue. It has been pushed to "libreoffice-7-4": https://git.libreoffice.org/core/commit/4c05f5b45ce8449adee81aeac355fae652b97786 tdf#147892 DOCX: fix corrupt export at para marker revision history It will be available in 7.4.7. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
7.5.2.2 на Linux x86_64 did not deliver the fix, did not catch it on Windows 7.5.2.2 на Linux x86_64 не довезли исправление, на виндовс не поймал
I beg your pardon, they brought it in, confusion with 2 versions came out. прошу прощения завезли, путаница с 2 версиями вышла.
the gnome does kde not apply to ours. гном к нашей kde не относится.
since 7.6. the breaking tags have been delivered again. This time there are more of them с 7.6. снова завезли ломающиеся теги. На сей раз стало их больше
stdContent line 2 and ins(del) stdContent line 2 and p hyperlink