Bug 113608 - FILEOPEN DOCX Table cell indentation grows in particular document
Summary: FILEOPEN DOCX Table cell indentation grows in particular document
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.4.2.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:6.2.0
Keywords: bibisected, bisected, filter:docx, regression
Depends on:
Blocks:
 
Reported: 2017-11-02 16:23 UTC by Gabor Kelemen (allotropia)
Modified: 2018-08-23 06:23 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
The reduced example document (80.38 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-11-02 16:23 UTC, Gabor Kelemen (allotropia)
Details
The table part in Word 2013 (119.98 KB, image/png)
2017-11-02 16:24 UTC, Gabor Kelemen (allotropia)
Details
The table in LO 5.4 (131.26 KB, image/png)
2017-11-02 16:24 UTC, Gabor Kelemen (allotropia)
Details
The text part in Word 2013 (108.13 KB, image/png)
2017-11-02 16:24 UTC, Gabor Kelemen (allotropia)
Details
The text part in LO 5.4 (116.14 KB, image/png)
2017-11-02 16:25 UTC, Gabor Kelemen (allotropia)
Details
The same file without the page break on the first page (79.53 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-11-02 16:26 UTC, Gabor Kelemen (allotropia)
Details
The file without the page break on the first page in LO 5.4 (134.87 KB, image/png)
2017-11-02 16:27 UTC, Gabor Kelemen (allotropia)
Details
numbering.docx: replaced Cmsor with Heading for 2, 3, and 4 (75.97 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-08-15 07:12 UTC, Justin L
Details
tdf113608_runAwayNumbering.docx: minimized unit test (16.89 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-08-15 14:57 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gabor Kelemen (allotropia) 2017-11-02 16:23:37 UTC
Created attachment 137470 [details]
The reduced example document

I'm not sure how to reproduce the behavior in the attached document (it's a minimized version of a 250 pages company doc made by a far away department), but it opens in a terribly broken way in LO 5.4.2.

- Table cells have an overly huge indentation, causing the originally one page table overflow to 9 pages instead of 1.

- The paragraphs after the table get a lettered numbering style, with a lot of letters appearing before them. They did not have a numbering style in Word.

Also if I remove the page break from the first page, none of these happen.

Version: 5.4.2.2
Build ID: 22b09f6418e8c2d508a9eaf86b2399209b0990f4
CPU threads: 4; OS: Windows 6.2; UI render: default; 
Locale: hu-HU (hu_HU); Calc: group
Comment 1 Gabor Kelemen (allotropia) 2017-11-02 16:24:03 UTC
Created attachment 137471 [details]
The table part in Word 2013
Comment 2 Gabor Kelemen (allotropia) 2017-11-02 16:24:21 UTC
Created attachment 137472 [details]
The table in LO 5.4
Comment 3 Gabor Kelemen (allotropia) 2017-11-02 16:24:53 UTC
Created attachment 137473 [details]
The text part in Word 2013
Comment 4 Gabor Kelemen (allotropia) 2017-11-02 16:25:30 UTC
Created attachment 137474 [details]
The text part in LO 5.4
Comment 5 Gabor Kelemen (allotropia) 2017-11-02 16:26:05 UTC
Created attachment 137475 [details]
The same file without the page break on the first page
Comment 6 Gabor Kelemen (allotropia) 2017-11-02 16:27:44 UTC
Created attachment 137476 [details]
The file without the page break on the first page in LO 5.4
Comment 7 Dieter 2017-11-03 04:40:40 UTC
Lokking at your image from comment 2 it looks like, that there is also a numbering in the table itself (I can't see it in image from comment 1). Could this cause the indent?
Comment 8 Kevin Suo 2017-11-03 09:39:24 UTC
Reproducible in the most recent daily dbgutil build as today.
Version: 6.0.0.0.alpha1+
Build ID: 2416d69ad4fa26b65d5b05a8575ac96af6b2c9a9
CPU threads: 4; OS: Linux 4.13; UI render: default; VCL: gtk3; 
Locale: zh-CN (zh_CN.UTF-8); Calc: group

Set to NEW.
Comment 9 Justin L 2018-08-14 15:53:42 UTC
Bisected to LO 4.4 commit 29a91e4739496e1db5995fdece6a3ffa89e42774 Author:     Mark Hung. Commit Date: Tue Nov 18 11:24:19 2014 +0100
    Fix outline numbering for ooxml import filter.
https://cgit.freedesktop.org/libreoffice/core/commit/?id=29a91e4739496e1db5995fdece6a3ffa89e42774

but even though the problem disappears when this is reverted (the WW8 style name is Cmsor4 which didn't match Heading4, but the converted name matches "Heading 4"), it has just uncovered an existing problem. The table problem can also be resolved by setting the infamous bRemove = false. Somehow, the Heading 4 paragraph numbering from the sectPr of the first section's deleted paragraph is being applied to the rest of the document.
Comment 10 Gabor Kelemen (allotropia) 2018-08-14 18:50:01 UTC
(In reply to Justin L from comment #9)
> but even though the problem disappears when this is reverted (the WW8 style
> name is Cmsor4 which didn't match Heading4, 

"Heading 4" is "Címsor 4" in Hungarian, which is saved without the accented í character and the space.
Comment 11 Justin L 2018-08-15 07:12:09 UTC
Created attachment 144180 [details]
numbering.docx: replaced Cmsor with Heading for 2, 3, and 4

(In reply to Gabor Kelemen from comment #10)
> "Heading 4" is "Címsor 4" in Hungarian, which is saved without the accented
> í character and the space.
And this is good proof that Mark's commit is valid because the code logic is dependant on the use of English.

Doing a bibisect based on this modified-to-use-english-style-names document, I run into a one-day range of broken builds in LO 4.3. Combining the results of 43all and 43max, I get the 20 commit range of https://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=76fe205d7e0fe0a73616453209d8094cab9ce79f..a6e19ea7a8885dcbdb1d8b2f56586373595ba8aa

of which this one stands out (although reverting it does not resolve the problem, so likely this one also just exposes an existing problem...)
>author	Michael Stahl 2014-03-06 00:20:00 +0100
>commit 2b78f2cd7b9e4bab0f3b3b9119238f36a1bbc7b2
>rhbz#988516: DOCX import: fix context stack when importing header/footer
>
>When a header/footer substream is parsed, a ParagraphGroup is started,
>but not ended; so the properties of the last paragraph in the
>header/footer are applied to a paragraph in the body.
>
>The obvious fix to add a call to endParagraphGroup() at the end of w:p
>element breaks table cells.  So add a call to endParagraphGroup() at
>the end of the "hdr"/"ftr" element.>

None of this points in a clear direction yet. I had to change Cmsor -> Heading for all three - just doing one at a time didn't work. However, the bug is less dependent on Cmsor2 (since I didn't change that in document.xml).

Reference: (note these are bibisect commits, not code commits)
first bad 43all bibisect commit: 56a3b3c781fc2eb55f46641d89a866a91119a8a3
first bad 43max bibisect commit: 1dad09c0413728618600e94167f71b6fdda422e3
last good 43max bibisect commit: bd286ce1d9d478895e4a9aacf3469b659917f5ff
Comment 12 Justin L 2018-08-15 14:54:41 UTC
(In reply to Justin L from comment #11)
>rhbz#988516: DOCX import: fix context stack when importing header/footer
This is the same commit that caused the similar problem fixed in bug 97417

Here is what is happening.
1.) Heading 4 is assigned as the paragraph style for the sectPr empty paragraph, which will be bRemove'd. bRemove tries to delete PROP_NUMBERING_RULES to prevent run-on numbering (bug 97417), but there is nothing to delete.
2.) finish paragraph finds no explicit NUMBERING_RULES, so it looks to the style, and Heading 4 of course is a numbering style in LO.  So, now the NUMBERING_RULE property that bRemove tried to avoid has been secretly added.
Comment 13 Justin L 2018-08-15 14:57:15 UTC
Created attachment 144201 [details]
tdf113608_runAwayNumbering.docx: minimized unit test
Comment 14 László Németh 2018-08-15 15:09:55 UTC
@Justin: I pushed a possible fix for this issue:  https://gerrit.libreoffice.org/#/c/59117/, I will add your minimized unit test to it tomorrow, if it's ok for you. Thanks, László
Comment 15 Commit Notification 2018-08-16 11:03:10 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=4b432a8c77ae4b5e8305ca69953bc202e7614664

tdf#113608 DOCX import: don't use numbering of removed paragraphs

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Xisco Faulí 2018-08-20 16:46:43 UTC
Verified in

Version: 6.2.0.0.alpha0+
Build ID: 401cba4c20fbc930f034168872642428d7459218
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); Calc: threaded

@László Németh, Thanks for fixing this!!
Comment 17 László Németh 2018-08-23 06:23:59 UTC
@Justin, Xisco: Thanks for your help and feedback!