Bug 118691 - FILEOPEN DOCX Extra CR tag in table causes it to appear incorrectly
Summary: FILEOPEN DOCX Extra CR tag in table causes it to appear incorrectly
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: László Németh
URL: http://officeopenxml.com/WPtextSpecia...
Whiteboard: target:6.2.0 target:6.1.2
Keywords: filter:docx
Depends on:
Blocks: DOCX-Tables
  Show dependency treegraph
 
Reported: 2018-07-11 12:32 UTC by Gabor Kelemen
Modified: 2018-09-18 11:49 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Example document, reduced from a user doc (26.46 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-07-11 12:32 UTC, Gabor Kelemen
Details
Screenshot of the document in Word (67.97 KB, image/png)
2018-07-11 12:33 UTC, Gabor Kelemen
Details
The document in Writer (80.94 KB, image/png)
2018-07-11 12:34 UTC, Gabor Kelemen
Details
Another example version of the reduced user doc (19.13 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-07-11 12:40 UTC, Gabor Kelemen
Details
The other example in LO 6.2alpha and Word 2013 (34.68 KB, image/png)
2018-07-11 12:50 UTC, Gabor Kelemen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gabor Kelemen 2018-07-11 12:32:51 UTC
Created attachment 143454 [details]
Example document, reduced from a user doc

Attached simplified user document contains a simple 1x1 table. There are some text and a <w:cr/> tag in the cell.
When opening it in Writer, the content before the <w:cr/> tag appears top of the table, out of cell.

Actual results: 
The text before <w:cr/> tag appears out of the table in LibreOffice view.

Expected results: 
Whole text appears in the cell. The <w:cr/> tag removed.

LibreOffice details: 
Version: 6.2.0.0.alpha0+
Build ID: bb1d5780226bb1b9156580972eea9aa849178742
CPU threads: 1; OS: Windows 6.1; UI render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2018-07-03_05:56:48
Locale: hu-HU (hu_HU); Calc: group threaded
Comment 1 Gabor Kelemen 2018-07-11 12:33:46 UTC
Created attachment 143455 [details]
Screenshot of the document in Word
Comment 2 Gabor Kelemen 2018-07-11 12:34:07 UTC
Created attachment 143456 [details]
The document in Writer
Comment 3 Gabor Kelemen 2018-07-11 12:40:03 UTC
Created attachment 143459 [details]
Another example version of the reduced user doc
Comment 4 Gabor Kelemen 2018-07-11 12:50:01 UTC
Created attachment 143461 [details]
The other example in LO 6.2alpha and Word 2013

In a more complicated table the entire table structure disappears, leaving only the cell contents behind.

We have no idea how the users managed to create the original document in Word - it contained change tracking entries and comments from multiple organizations as well.
Comment 5 Xisco Faulí 2018-07-11 13:27:43 UTC
Reproduced in

Version: 6.2.0.0.alpha0+
Build ID: c290f692dd28094d41dff686f3faa1c4e14b556e
CPU threads: 4; OS: Linux 4.13; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group threaded

Version: 5.2.0.0.alpha0+
Build ID: 3ca42d8d51174010d5e8a32b96e9b4c0b3730a53
Threads 4; Ver: 4.10; Render: default; 

Version: 4.3.0.0.alpha1+
Build ID: c15927f20d4727c3b8de68497b6949e72f9e6e9e



LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4
Comment 6 Gabor Kelemen 2018-09-11 08:42:48 UTC
@Laszlo, I think you should be interested in this one.
Comment 7 László Németh 2018-09-17 13:01:21 UTC
Proposed fix: https://gerrit.libreoffice.org/#/c/60585/

tdf#118691 DOCX import: fix table loss caused by <w:cr>

According to the OOXML standard, <w:cr> (carriage return – Unicode character 000D) is equivalent to a break with null type and clear attributes, so we handle it as a <w:br/>, instead of endOfParagraph, fixing losing table paragraphs and tables containing <w:cr/>. Note: It seems, MSO cannot handle carriage return characters in table cells correctly. It shows squares (unknown characters) without line break there. Copying this text to a non-table paragraph in MSO, we get the correct layout with line breaks. Copying this text with carriage return characters back to a table cell, we get squares again. With this LO fix, it will be possible to fix the bad tables edited by MS Word by using LO, because LibreOffice import/export converts all <w:cr>s to <w:br>s (as before, but now without destroying the structure of the tables).
Comment 8 Commit Notification 2018-09-18 06:06:26 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f63a60f56156e4ac17887e6c96d15fb865a2a8eb

tdf#118691 DOCX import: fix table loss caused by <w:cr>

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Commit Notification 2018-09-18 11:49:22 UTC
László Németh committed a patch related to this issue.
It has been pushed to "libreoffice-6-1":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=8693f6fa799c43304741f465c23e827c3ceafd9d&h=libreoffice-6-1

tdf#118691 DOCX import: fix table loss caused by <w:cr>

It will be available in 6.1.2.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.