Bug 68326 - FILEOPEN: Bad table in DOCX generated from bank with excessive use of tables (OK if resaved in MSO)
Summary: FILEOPEN: Bad table in DOCX generated from bank with excessive use of tables ...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.0 release
Hardware: All All
: low normal
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords: filter:docx, preBibisect
: 146353 (view as bug list)
Depends on:
Blocks: DOCX-Tables MSO-External-Producers
  Show dependency treegraph
 
Reported: 2013-08-20 11:02 UTC by igorz
Modified: 2023-03-07 19:08 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Bad table formating docx (24.59 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2013-08-20 11:02 UTC, igorz
Details
bad-doc1_docx.pdf: enabled table border lines first, then exported from Word 2003 (65.23 KB, application/pdf)
2017-03-07 07:37 UTC, Justin L
Details
bad-doc1RT2003.docx: round-tripped by MS Word 2003 - for comparison (26.37 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-07-18 12:13 UTC, Justin L
Details
hMerged1.docx: relatively complex merge. LO can in general handle hMerge properly. (9.46 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-07-18 14:13 UTC, Justin L
Details
hMerged_68326.zip: two minimalistic versions of the original file, with pdfs (81.51 KB, application/zip)
2020-07-18 15:52 UTC, Justin L
Details
Bad table formating DOCX resaved in MSO (24.92 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2022-03-10 15:09 UTC, Timur
Details

Note You need to log in before you can comment on or make changes to this bug.
Description igorz 2013-08-20 11:02:14 UTC
Created attachment 84324 [details]
Bad table formating docx

This document is only one page.
Writer shows me 9 pages. 
Operating System: Windows 7
Version: 4.1.0.4 release
Comment 1 Thomas van der Meulen [retired] 2013-08-20 13:20:11 UTC
Thank you for your bug report, I can reproduce this bug running LibreOffice Version: 4.1.1.1
Build ID: a990db030b8125868501634ff662be1d89d0868 on Mac osx 10.8.4.
Comment 2 igorz 2013-08-20 13:34:34 UTC
Version: 4.1.1.1
Build ID: a990db030b8125868501634ff662be1d89d0868
Windows XP
The same problem. Part of the table on different pages.
Comment 3 Xisco Faulí 2014-03-26 17:02:06 UTC
it's still reproducible with:
   - Libreoffice 4.1.5.3 Build ID: 1c1366bba2ba2b554cd2ca4d87c06da81c05d24
   - Libreoffice 4.2.2.1 Build ID: 3be8cda0bddd8e430d8cda1ebfd581265cca5a0f
   - Libreoffice 4.3.0.0.alpha0 Build ID: aeab0183e86fe011d32058864c02b2de4da32dc9
Comment 4 Joel Madero 2015-05-02 15:41:40 UTC Comment hidden (obsolete)
Comment 5 igorz 2015-05-13 12:32:34 UTC Comment hidden (obsolete)
Comment 6 QA Administrators 2016-09-20 09:41:56 UTC Comment hidden (obsolete)
Comment 7 igorz 2016-09-20 11:15:42 UTC
5.2.1.2
Windows 7
same problem
Comment 8 Xisco Faulí 2016-10-08 16:06:53 UTC Comment hidden (obsolete)
Comment 9 Xisco Faulí 2016-10-08 16:11:02 UTC
Can be already reproduced in LibreOffice 3.5.0 
Build ID: d6cde02
Comment 10 Timur 2017-03-06 19:24:59 UTC Comment hidden (obsolete)
Comment 11 Justin L 2017-03-07 07:37:45 UTC
Created attachment 131688 [details]
bad-doc1_docx.pdf: enabled table border lines first, then exported from Word 2003

This document is an extremely complex mess of tiny rows, columns, and tables within tables.  I think every block of text is in a separate table.  Some lines are extra heavy, because there are minimal-width columns touching each other.

The problem is related to style "EmptyCellLayoutStyle" which doesn't seem to be created. So the outermost table is failing to be created because of exceptions.
Comment 12 Justin L 2017-03-07 09:20:25 UTC
The main table is created approximately correctly with this change:

-if( pEntry->nStyleTypeCode == STYLE_TYPE_CHAR || pEntry->nStyleTypeCode == STYLE_TYPE_PARA || pEntry->nStyleTypeCode == STYLE_TYPE_LIST )
+if( pEntry->nStyleTypeCode == STYLE_TYPE_CHAR || pEntry->nStyleTypeCode == STYLE_TYPE_PARA || pEntry->nStyleTypeCode == STYLE_TYPE_LIST || pEntry->nStyleTypeCode == STYLE_TYPE_UNKNOWN )
{
-                    bool bParaStyle = pEntry->nStyleTypeCode == STYLE_TYPE_PARA;
+                    bool bParaStyle = pEntry->nStyleTypeCode == STYLE_TYPE_PARA || pEntry->nStyleTypeCode == STYLE_TYPE_UNKNOWN;

However, the table-in-tables are mostly hidden behind the other cells, instead of being laid out on top. Dealing with tables is a bit too complicated for me...
Comment 13 QA Administrators 2018-04-13 02:32:24 UTC Comment hidden (obsolete)
Comment 14 Bhen Chod 2019-05-23 11:41:46 UTC Comment hidden (obsolete, spam)
Comment 15 Commit Notification 2020-07-18 05:54:04 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/a59ecc3137cc59438cc2cf946223148b6d1a5600

tdf#68326 writerfilter: default style type is paragraph

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Justin L 2020-07-18 05:56:40 UTC
Bug not fixed, but one step forward by implementing the idea in comment 12.
Comment 17 Justin L 2020-07-18 06:04:12 UTC Comment hidden (obsolete)
Comment 18 Justin L 2020-07-18 12:13:10 UTC
Next up is to handle w:hMerge properly. Instead of using w:gridSpan, this generator uses w:hMerge. Lots of loading errors suggesting off by one or more cells when trying to convert. likely there is confusion about what cell A1 etc means in this elaborate mess.

Merge support was added via commit 97dcf77841d19d344d58d5bdacdab141cdea4817
Author: Miklos Vajna on Fri Dec 27 21:07:43 2013 +0100
    Related: fdo#65090 DOCX import: handle w:hMerge cell property
Comment 19 Justin L 2020-07-18 12:13:55 UTC
Created attachment 163233 [details]
bad-doc1RT2003.docx: round-tripped by MS Word 2003 - for comparison
Comment 20 Justin L 2020-07-18 14:13:10 UTC
Created attachment 163238 [details]
hMerged1.docx: relatively complex merge. LO can in general handle hMerge properly.
Comment 21 Justin L 2020-07-18 15:52:58 UTC
Created attachment 163244 [details]
hMerged_68326.zip: two minimalistic versions of the original file, with pdfs
Comment 22 Justin L 2020-07-20 15:52:44 UTC
It seems like UNO sometimes fails to select the requested table cells, therefore the merge fails, therefore default grid spacing exists. And things stay pretty messy since most of the grid widths are "1" and not a realistic number, or don't add up to the actual table width - and the extra width is usually given to the last column, squishing everything earlier.

<text::XTextTableCursor>xCursor=xTable->createCursorByCellName(aFirst);
xCursor->gotoCellByName(aLast, true);  //returns false when select fails
xCursor->mergeRange(); //returns false when it fails, i.e. when no selection.

When it gets into this low level stuff, it is too hard and dangerous for me. Bye.
Comment 23 Timur 2022-03-10 14:56:26 UTC
*** Bug 146353 has been marked as a duplicate of this bug. ***
Comment 24 Timur 2022-03-10 14:59:55 UTC
(In reply to Justin L from comment #17)
> (In reply to Timur from comment #10)
> > Interesting. I'd guess this is docx generated by some software.
> 
> Yes, this is a generated document. This point is worth clearly highlighting
> (and it is mentioned in the bug summary). Thus I am lowering the importance
> to minor, since the onus should be on them to generate a document that their
> intended audience can use.

It's Low priority for a Normal Importance that we give to OOXML docs.
Comment 25 Timur 2022-03-10 15:09:26 UTC
Created attachment 178780 [details]
Bad table formating DOCX resaved in MSO

Opens OK in LO if resaved in MSO.
Comment 26 Roman Kuznetsov 2023-03-07 19:08:03 UTC
23 ugly pages instead only 1 in

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: d7c609dbb1bd08865b43719d2fb7c316d30bcde5
CPU threads: 16; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: CL threaded