Bug Hunting Session
Bug 90697 - FILEOPEN DOCX layout messed up with wrong number of pages (continuous break changed to page break)
Summary: FILEOPEN DOCX layout messed up with wrong number of pages (continuous break c...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.6.7.2 release
Hardware: All All
: high normal
Assignee: Justin L
URL:
Whiteboard: interoperability target:5.3.0 target:...
Keywords: bibisected, filter:docx, regression
: 64407 100513 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-04-18 18:01 UTC by Buovjaga
Modified: 2018-12-05 12:38 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Bugged out file originating from MSO (41.78 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2015-04-18 18:01 UTC, Buovjaga
Details
PDF exported from MSO 2013 (182.77 KB, application/pdf)
2015-04-18 18:02 UTC, Buovjaga
Details
tdf92724_continuousBreaksComplex2.pdf: printout for additional unit test (8.38 KB, application/pdf)
2016-09-27 13:19 UTC, Justin L
Details
firstInheritTest.docx: continuous breaks are evil (15.34 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2016-09-30 07:12 UTC, Justin L
Details
firstInheritTest.pdf: what it looks like in Word2003 (4.58 KB, application/pdf)
2016-09-30 07:18 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Buovjaga 2015-04-18 18:01:48 UTC
Created attachment 114887 [details]
Bugged out file originating from MSO

Somehow LibO displays "manual column breaks visible" in the headers. Don't know, if it's related to the layout/flow being broken.

Win 8.1 32-bit
MSO 2013
LibO Version: 4.5.0.0.alpha0+
Build ID: 211c12b9c64facd1c12f637a5229bd6a6feb032a
TinderBox: Win-x86@39, Branch:master, Time: 2015-04-18_00:35:20
Locale: fi_FI
Comment 1 Buovjaga 2015-04-18 18:02:18 UTC
Created attachment 114888 [details]
PDF exported from MSO 2013

Compare to this.
Comment 2 Buovjaga 2015-04-18 18:59:32 UTC
Interesting: 3.3 and 3.5 don't look as bad. 3.3.0 has the right page count (7) and 3.5.0 has one extra page. With 3.6.7, the page count has exploded to 17.

I'm adding a bibisect request and hope the layout lengthening issue can be spotted at least.

Ubuntu 14.10 64-bit
LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4

LibreOffice 3.5.0rc3 
Build ID: 7e68ba2-a744ebf-1f241b7-c506db1-7d53735

Version 3.6.7.2 (Build ID: e183d5b)
Comment 3 Carsten 2015-04-27 11:19:37 UTC
I can confirm this, LO adds "manual column breaks" on top of page 2, 6, 8, 10 and 12.

Linux / Arch x64
Version: 4.4.2.2
Build-ID: 4.4.2.2 Arch Linux build-1
Locale: de_DE
Comment 4 Michael Weghorn 2015-08-20 21:46:46 UTC
Bug 63662 deals with the text "manual column break" being displayed. I just verified that the bibisect result in comment 11 of that bug report is also valid for the document attached here.


When opening the document in the respective LibreOffice version, the document still has only 7 pages, so the fact that the amount of pages increases when opening the document in LibreOffice  is another issue.
Comment 5 Cor Nouws 2015-09-01 07:56:50 UTC
(In reply to Michael Weghorn from comment #4)
> Bug 63662 deals with the text "manual column break" being displayed. I just
> verified that the bibisect result in comment 11 of that bug report is also
> valid for the document attached here.

So I remove that from the summary (since 63662 handles it)

> When opening the document in the respective LibreOffice version, the
> document still has only 7 pages, so the fact that the amount of pages
> increases when opening the document in LibreOffice  is another issue.
Comment 6 Robinson Tryon (qubit) 2015-12-14 05:32:36 UTC Comment hidden (noise, obsolete)
Comment 7 Joel Madero 2015-12-14 20:54:55 UTC
Bibisect of when the page count jumped to >17 pages...it fluctuates a bit in between (7-8 pages):

5b4693bb72eca5e38e3f56d036bca425c9a21b37 is the first bad commit
commit 5b4693bb72eca5e38e3f56d036bca425c9a21b37
Author: Bjoern Michaelsen <bjoern.michaelsen@canonical.com>
Date:   Sun Dec 9 11:49:31 2012 +0000

    source-hash-e3633f60b349022994e291aa3d1a0c90c3403b2e
    
    commit e3633f60b349022994e291aa3d1a0c90c3403b2e
    Author:     Stephan Bergmann <sbergman@redhat.com>
    AuthorDate: Wed May 16 09:32:51 2012 +0200
    Commit:     Stephan Bergmann <sbergman@redhat.com>
    CommitDate: Wed May 16 09:36:38 2012 +0200
    
        fdo#46074 fdo#49948 Ignore corrupted items in Recent Documents
    
        ...following up on 4ccb4bda483eb548eb6efb5e2f1952f094522320 "fdo#46074 Ignore
        corrupted items in Recent Documents" with another problematic scenario found
        with fdo#49948.
    
        Change-Id: I3e7c803813f09c1f031defc2c18cfab6732b1621

:100644 100644 5aa1dfc68ecb9ac57316a995424b2d3683cb4774 aa42f04f09d97d387333244ba505d2fd3c3086c2 M	autogen.log
:100644 100644 72da0ea5e9ec1223cb456558a2e0254561faa98c 1829a020e51322ed60e655809575a93edd3b9032 M	ccache.log
:100644 100644 5ef3324ce1c257155c9e095fdeb7d912b2681ae1 795d8ec3e2d59c5f0a85099dac7224954a57c4f2 M	commitmsg
:100644 100644 8b14489bddefe04fcfaecb0be901837505c64b67 5e870f27775bef1e12288b413b09a4052c414870 M	dev-install.log
:100644 100644 68ac6a90c73f1f7c8776a70772a40ae1ce41e13d 78b57ac998248d89343563f89455faeeea3f57a1 M	make.log
:040000 040000 8b906c6863615fd1253b393b35b18a883201b310 e793bfa8b661936460e69be1537f15a7e99d3289 M	opt


# bad: [423a84c4f7068853974887d98442bc2a2d0cc91b] source-hash-c15927f20d4727c3b8de68497b6949e72f9e6e9e
# good: [65fd30f5cb4cdd37995a33420ed8273c0a29bf00] source-hash-d6cde02dbce8c28c6af836e2dc1120f8a6ef9932
git bisect start 'latest' 'oldest'
# bad: [e02439a3d6297a1f5334fa558ddec5ef4212c574] source-hash-6b8393474974d2af7a2cb3c47b3d5c081b550bdb
git bisect bad e02439a3d6297a1f5334fa558ddec5ef4212c574
# bad: [8f4aeaad2f65d656328a451154142bb82efa4327] source-hash-1885266f274575327cdeee9852945a3e91f32f15
git bisect bad 8f4aeaad2f65d656328a451154142bb82efa4327
# good: [369369915d3582924b3d01c9b01167268ed38f3b] source-hash-45295f3cdceb4c289553791071b5d7f4962d2ec4
git bisect good 369369915d3582924b3d01c9b01167268ed38f3b
# bad: [6fce03a944bf50e90cd31e2d559fe8705ccc993e] source-hash-47e4a33a6405eb1b5186027f55bd9cb99b0c1fe7
git bisect bad 6fce03a944bf50e90cd31e2d559fe8705ccc993e
# good: [8a39227e344637eb7154a10ac825d211e64d584c] source-hash-f5080ebb7022c9f5d7d7fdca4fe9d19f9bb8cabf
git bisect good 8a39227e344637eb7154a10ac825d211e64d584c
# bad: [e4c742a9e244bd7ebeabc50c90182df28ac3daaf] source-hash-c52ba433491afbca70aa1977a624c795bdd5b9ef
git bisect bad e4c742a9e244bd7ebeabc50c90182df28ac3daaf
# good: [96a055e15ee7171a28888973a3c3a7307dd9867f] source-hash-9ca02a663c3eee2698eb360dd5dc7afb1951e743
git bisect good 96a055e15ee7171a28888973a3c3a7307dd9867f
# bad: [e87a0055deae2c9e25ae1d1a365cec8418b785ce] source-hash-67ff63988f3b8eef2cc2b5bdf917918b93c3f070
git bisect bad e87a0055deae2c9e25ae1d1a365cec8418b785ce
# bad: [5b4693bb72eca5e38e3f56d036bca425c9a21b37] source-hash-e3633f60b349022994e291aa3d1a0c90c3403b2e
git bisect bad 5b4693bb72eca5e38e3f56d036bca425c9a21b37
# good: [d101b9946a6a04e65e3923038503436c790b7e12] source-hash-18e6e7d929c2be209407ed2e56b8ec4d5e6c4900
git bisect good d101b9946a6a04e65e3923038503436c790b7e12
# first bad commit: [5b4693bb72eca5e38e3f56d036bca425c9a21b37] source-hash-e3633f60b349022994e291aa3d1a0c90c3403b2e
Comment 8 Justin L 2016-06-25 17:32:34 UTC
I believe the first bad commit will be author	Miklos Vajna <vmiklos@suse.cz>	2012-05-15 06:56:38 (GMT)  commit 50cb1667020494906afaacb68d4163d1eda527cf 
fdo#49940 dmapper: handle m_bTitlePage when m_nBreakType is zero

The problem in buggy2.docx is that headers are redefined many times. In word, it appears as if headers are a section property while in LO they are a page style property.  Thus, in LO, a new style is created for each section that has a different header (and thus continuous breaks are converted into page breaks).

This also causes the document to round-trip terribly - since all continuous breaks have been removed, the column breaks start affecting the wrong places.
Comment 9 Commit Notification 2016-06-29 07:36:22 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=50bf96d31ab2eb546f6c71cc93c1fa5dd4bf3044

tdf#90697 docx - don't change continuous break into page break

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Cor Nouws 2016-07-04 10:11:01 UTC
*** Bug 100513 has been marked as a duplicate of this bug. ***
Comment 11 Cor Nouws 2016-07-04 10:12:12 UTC
checked in Version: 5.3.0.0.alpha0+
Build ID: cc503abb860c33a54a188640a5962dbdf7052284
CPU Threads: 2; OS Version: Linux 4.4; UI Render: default; 
TinderBox: Linux-rpm_deb-x86@71-TDF, Branch:master, Time: 2016-07-04_00:55:33
Locale: nl-NL (nl_NL.UTF-8)

Thanks Justin :) !
Comment 12 Cor Nouws 2016-07-04 13:13:57 UTC
after this fix, another problem became visible:

 Bug 100758 - [FILEOPEN] Word Continuous section break results in 2nd and more footnotes being pushed to next page on import of DOCX file (edit)
Comment 13 Cor Nouws 2016-07-04 15:11:12 UTC
*** Bug 64407 has been marked as a duplicate of this bug. ***
Comment 14 Commit Notification 2016-08-23 19:59:47 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "libreoffice-5-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=6f9cbfad8744646b5b1f79d5fbf1c1f9eb03519d&h=libreoffice-5-2

tdf#90697 docx - don't change continuous break into page break

It will be available in 5.2.2.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 Commit Notification 2016-09-27 06:58:52 UTC Comment hidden (obsolete)
Comment 16 Commit Notification 2016-09-27 11:57:03 UTC Comment hidden (obsolete)
Comment 17 Justin L 2016-09-27 13:19:39 UTC
Created attachment 127672 [details]
tdf92724_continuousBreaksComplex2.pdf: printout for additional unit test

likely someone will want to revert these changes as causing a regression in other documents - something unavoidable.  Adding tests to ensure that their solution addresses various complex cases.

tdf92724_continuousBreaksComplex2:  https://gerrit.libreoffice.org/#/c/29322/
Comment 18 Cor Nouws 2016-09-27 20:49:21 UTC
(In reply to Justin L from comment #17)

> likely someone will want to revert these changes as causing a regression in
> other documents - something unavoidable.  

Can you please explain which "these changes"?
Comment 19 Justin L 2016-09-28 11:00:23 UTC
"these changes" means not treating every header/footer definition as an applied page style (aka page break).  This results in data loss - unused h/f are currently lost.
(actually, in this complexBreaks2.docx, even the USED page style on the last page is lost in round-tripping, so there is still lots of work to be done on docx section breaks).
Comment 20 Commit Notification 2016-09-28 19:09:11 UTC
Samuel Mehrbrodt committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=ad48da87038bd0ae67c2edb4199813e1a2205a69

Reintroduce "tdf#90697 unit test for rtf import"

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Commit Notification 2016-09-29 04:36:52 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=13c0122ad18dd1db187de8afc2ef406421d6d0e5

tdf#90697 unit test for an unused FirstPage header

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 22 Justin L 2016-09-30 07:12:21 UTC
Created attachment 127733 [details]
firstInheritTest.docx: continuous breaks are evil

A carefully crafted document can look great on older version of LO and terrible in the current one. Continuous breaks need to be replaced with PageBreaks in order to be even remotely compatible.
Comment 23 Justin L 2016-09-30 07:18:52 UTC
Created attachment 127734 [details]
firstInheritTest.pdf: what it looks like in Word2003