Bug 133334 - [FILESAVE] Saving as .docx generates spurious page breaks
Summary: [FILESAVE] Saving as .docx generates spurious page breaks
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.4.0.0.alpha0+
Hardware: All All
: medium normal
Assignee: Justin L
URL:
Whiteboard:
Keywords: bibisected, bisected, filter:docx, regression
Depends on:
Blocks:
 
Reported: 2020-05-24 05:33 UTC by Luke Kendall
Modified: 2020-05-26 09:08 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Zip file with two short sample documents (7.24 MB, application/zip)
2020-05-24 05:33 UTC, Luke Kendall
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kendall 2020-05-24 05:33:40 UTC
Created attachment 161212 [details]
Zip file with two short sample documents

Saving a document as .docx causes and extra spurious page break to be inserted on or soon after the 1st page following manual page breaks.

By examining the XML produced, I found spurious <w:sectPR>...</w:sectPR> XML that causes the additional page breaks to be inserted. Examining the XML of the source .odt, I was unable to see anything that might lead to an extra page break.

The first spurious page break is after the text "letting any other speak who wished to." in the Prologue.
The second spurious page break is after the text "Calmly, he returned to studying the orphanage records."

The .docx conversion also changes the layout by causing page spills in places they weren't before, but I consider that a minor nuisance.

Please see the attached source document WTKDP-BadDocxGen.odt and the bad .docx file generated by Save As from:

Version: 6.4.3.2
Build ID: 747b5d0ebf89f41c860ec2a39efd7cb15b54f2d8
CPU threads: 4; OS: Linux 4.4; UI render: default; VCL: x11; 
Locale: en-GB (en_AU.UTF-8); UI-Language: en-US
Calc: threaded
Comment 1 Telesto 2020-05-24 07:50:39 UTC Comment hidden (no-value)
Comment 2 Luke Kendall 2020-05-24 08:51:46 UTC Comment hidden (no-value)
Comment 3 Luke Kendall 2020-05-24 09:04:08 UTC Comment hidden (no-value)
Comment 4 Luke Kendall 2020-05-24 10:48:40 UTC
Also just noting that loading up the .docx version in Writer and deleting the spurious page breaks one by one, and correcting all the page styles, and saving, still produces a .docx file that other programs see as having extra page breaks.

The only way to remove them I've found is to edit the XML manually, or with an alternative to LibreOffice such as FreeOffice TextMaker.
Comment 5 Telesto 2020-05-24 14:02:02 UTC Comment hidden (obsolete)
Comment 6 Telesto 2020-05-24 14:07:00 UTC Comment hidden (obsolete)
Comment 7 Telesto 2020-05-24 14:58:35 UTC Comment hidden (no-value)
Comment 8 Luke Kendall 2020-05-24 22:12:41 UTC Comment hidden (no-value)
Comment 9 Telesto 2020-05-25 06:04:06 UTC Comment hidden (no-value)
Comment 10 Telesto 2020-05-25 06:21:32 UTC Comment hidden (no-value)
Comment 11 Justin L 2020-05-25 08:13:20 UTC
The first column break I see is on the second page.  "Wild Thing" (title paragraph style) is set with a "Break" type column.  It serves no purpose in the original.
The same is true two empty paragraphs before "Prologue".

But that isn't what this bug report is about.  It is about paragraphs where there IS NO column break, and yet this still happens (although the earlier bibisect and comment 6 about "ODT looked like this" is referring to the two previous breaks).


Bisecting (and a test revert) based on the two reported cases points to LO 6.4 commit 5d1709a7c4184eb31cfc4c2d3acadff3a4a68189 by Author: Miklos Vajna on Mon Oct 28 21:52:18 2019 +0100
    tdf#104017 DOC export: be less aggressive with merging page styles


CC: Miklos, but just ignore this for a few days - I'm going to research this.
Comment 12 Justin L 2020-05-25 11:15:04 UTC
The regression patch was back-ported to LO 6.3.4.

I can't believe that this change was made and backported in the first place, let alone that it was never caught by QA's automated testing.

Proposed to revert it at gerrit.libreoffice.org/c/core/+/94782 Revert "tdf#104017 DOC export: be less aggressive with merging page styles"
Comment 13 Telesto 2020-05-25 11:37:28 UTC
@Justin
Thanks for looking into it. Is a new report needed for comment 5.
Comment 14 Luke Kendall 2020-05-25 12:10:50 UTC Comment hidden (off-topic)
Comment 15 Justin L 2020-05-25 12:18:39 UTC
(In reply to Telesto from comment #13)
> Thanks for looking into it. Is a new report needed for comment 5.
Yes. I made bug 133370 - thanks for prompting me.
Comment 16 Justin L 2020-05-26 08:46:13 UTC
This will be fixed in LO 6.4.5.
The patch from bug 104017 was reverted, and it was marked as a duplicate of bug 48097.
Comment 17 Telesto 2020-05-26 09:05:01 UTC
(In reply to Justin L from comment #16)
> This will be fixed in LO 6.4.5.
> The patch from bug 104017 was reverted, and it was marked as a duplicate of
> bug 48097.

@Justin,
Is it possible to add a unit test something.. preventing this from happening again?
Comment 18 Justin L 2020-05-26 09:08:22 UTC
(In reply to Telesto from comment #17)
> Is it possible to add a unit test something.. preventing this from happening
> again?
Already done.  A test was added to the revert commit (and was the perfect opportunity to introduce a new ooxmlexport15 all the way back to the 6.4 branch - which will make future backporting of other patches easier).