Bug 137357 - FILEOPEN DOCX Extraneous tracked changes appear in specific .docx generated in SDL Trados
Summary: FILEOPEN DOCX Extraneous tracked changes appear in specific .docx generated i...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.0.1.2 release
Hardware: All All
: low normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.3.0
Keywords:
Depends on:
Blocks: DOCX-Tables DOCX-Track-Changes MSO-External-Producers
  Show dependency treegraph
 
Reported: 2020-10-09 04:48 UTC by Johannes Wülk
Modified: 2023-09-07 20:23 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Docx file (17.16 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-10-09 14:36 UTC, Johannes Wülk
Details
The document in Word and current Writer (215.68 KB, image/png)
2021-07-21 14:45 UTC, NISZ LibreOffice Team
Details
The example file in Word 2016 and Writer (153.56 KB, image/png)
2023-03-29 09:16 UTC, Gabor Kelemen (allotropia)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Johannes Wülk 2020-10-09 04:48:57 UTC
Description:
If I open .docx files that are created by the CAT Tool "SDL Trados Studio", I receive error messages that are stated below. These docx files just contain bilingual tables. These issues only happen on .docx files and never on .doc files. Unfortunately SDL Trados does not export .doc files. As I'd like to avoid using MS Office by any means, I'd really be pleased if there is a way to read in those docx files correctly without having to convert them to doc or odt first.  


Steps to Reproduce:
1. Export bilingual file (docx) in SDL Trados Studio
2. Open exported docx with Libre Office Writer.
3.

Actual Results:
An error occurred during opening the file. This may be caused by incorrect file contents.
The error details are:
SAXException: [word/document.xml line 1]: unknown error /build/libreoffice-fresh/src/libreoffice-7.0.1.2/sax/source/fastparser/fastparser.cxx:588
Proceed with import may cause data loss or corruption, and application may become unstable or crash.

Do you want to ignore the error and attempt to continue loading the file?
If yes: File is being opened with less than half of the content showing.
If no: File format error found at C++ code threw N403t114divide_by_zeroE: divide
by zero /build/libreoffice-fresh/src/libreoffice-7.0.1.2/br idges/source/-
cpp_uno/gcc3_linux_x86-64/un02cpp.cxx:243
SAXParseException: '[word/document.xml lirS 1]: unknown error /build/-
libreoffice-fresh/src/libreoffice-7.0.1.2/sax/source/fastparser /fastparser.cxx:-
588', Stream 'word/document.xml', Line 1, Column 28832 /build/libreoffice-
fresh/src/libreoffice-7.0.1.2/writerfilter/source/filter/WriterFilter.cxx:-
213(row,col).

Expected Results:
Opening the docx file and showing full content correctly.


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 7.0.1.2
Build ID: 00(Build:2)
CPU threads: 4; OS: Linux 5.8; UI render: default; VCL: gtk3
Locale: de-DE (de_DE.UTF-8); UI: en-US
=7.0.1-1
Calc: threaded
Comment 1 Timur 2020-10-09 08:49:45 UTC
We cannot follow those steps, you need to attach .docx.

Note that there are problems with opening generated files, as you may see in other bugs, so this may be a bug, a duplicate or not a bug.
Comment 2 Henry Joshua 2020-10-09 11:41:09 UTC Comment hidden (obsolete)
Comment 3 Johannes Wülk 2020-10-09 14:36:29 UTC
Created attachment 166243 [details]
Docx file
Comment 4 Johannes Wülk 2020-10-09 14:37:52 UTC Comment hidden (obsolete)
Comment 5 Timur 2020-10-12 14:05:15 UTC
I confirm the issue with attached DOCX in LO 7.1+.
No issue if DOCX first resaved in MSO, meaning it's probably not proper DOCX.
This may still be NotOurBug or WontFix.
Comment 6 Timur 2020-10-12 14:08:31 UTC
DOCX:

<?xml version="1.0" encoding="utf-8"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14">
	<w:body>

Resaved in MSO: 

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid wp14">
	<w:body>
Comment 7 Johannes Wülk 2020-10-20 14:08:43 UTC
I confirm that re-saving the docx in MSO fixes the issue for now.
Comment 8 NISZ LibreOffice Team 2021-07-21 14:42:34 UTC
There is no longer a SAXException since:

https://cgit.freedesktop.org/libreoffice/core/commit/?id=67d41607ad3b97abbb939a989e491af932e985a7


author	Aron Budea <aron.budea@collabora.com>	2021-02-28 22:04:24 +0100
committer	Caolán McNamara <caolanm@redhat.com>	2021-03-01 10:18:06 +0100

tdf#140137 Don't throw exception when w:gridCol is missing "w" attr

However it is still bad: Upon opening all paragraph gets a change tracked formatting change, that is not present at all in Word.

Let's refocus this bug for that.
Comment 9 NISZ LibreOffice Team 2021-07-21 14:45:39 UTC
Created attachment 173743 [details]
The document in Word and current Writer

Version: 7.3.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 0cda081c9aa3b3dcb363f97bac60c845ce9a13e0
CPU threads: 4; OS: Windows 10.0 Build 18363; UI render: Skia/Raster; VCL: win
Locale: hu-HU (hu_HU); UI: en-US
Calc: CL
Comment 10 Commit Notification 2021-08-26 09:14:06 UTC
Caolán McNamara committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/2be207ed8969a96da8bdc0ffd7f2a2215233ee4a

crashtesting: crash on re-export of tdf137357-1.docx to docx

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Gabor Kelemen (allotropia) 2023-03-29 09:16:01 UTC
Created attachment 186278 [details]
The example file in Word 2016 and Writer

How it looks in 7.5.

Turns out there are change tracking information in the original, which span over multiple lines of the table.
In Writer each cell gets a change, making the number of entries grow from 14 in Word to 86*3 in Writer - quite annoying to approve/reject :).