Bug 149313 - FILEOPEN DOCX Consecutive page break and section break result in extra page
Summary: FILEOPEN DOCX Consecutive page break and section break result in extra page
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.3.0.4 release
Hardware: All All
: medium normal
Assignee: Vasily Melenchuk (CIB)
URL:
Whiteboard: target:7.5.0 target:7.4.0.2
Keywords: bibisected, bisected, regression
Depends on:
Blocks: DOCX-Page
  Show dependency treegraph
 
Reported: 2022-05-26 09:59 UTC by Gabor Kelemen (allotropia)
Modified: 2023-05-15 14:51 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Example file from Word 2016 (14.36 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2022-05-26 09:59 UTC, Gabor Kelemen (allotropia)
Details
The example file in Word 2016 and Writer master (110.11 KB, image/png)
2022-05-26 09:59 UTC, Gabor Kelemen (allotropia)
Details
The example file modified: matching Section start property (14.44 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2022-05-26 10:00 UTC, Gabor Kelemen (allotropia)
Details
Second example in Word and Writer (102.69 KB, image/png)
2022-05-26 10:02 UTC, Gabor Kelemen (allotropia)
Details
The example file modified: different Section start property, but no empty para between the breaks (14.45 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2022-05-26 10:03 UTC, Gabor Kelemen (allotropia)
Details
Third example in Word and Writer (97.12 KB, image/png)
2022-05-26 10:04 UTC, Gabor Kelemen (allotropia)
Details
tdf97648_relativeWidth2.docx: modified to add purple color to page/continuous break (20.36 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-05-15 14:50 UTC, Justin L
Details
tdf97648_relativeWidth2_2010.pdf: exported by MS Word 2010 - 2016 look the same. (89.48 KB, application/pdf)
2023-05-15 14:51 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gabor Kelemen (allotropia) 2022-05-26 09:59:28 UTC
Created attachment 180405 [details]
Example file from Word 2016

Attached document contains a consecutive page break, an empty paragraph and a section break. The page break has its Section start property set to "Continous", the section break has its Section start property set to "New page".

When this document is opened in Writer, it has an empty extra page, containing only the empty paragraph that is between the two breaks in Word.

Version: 7.4.0.0.alpha1+ (x64) / LibreOffice Community
Build ID: b6266207b55a7633dc82b02142215757512adfb7
CPU threads: 14; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: en-US (hu_HU); UI: en-US
Calc: threaded

bibisected in windows-6.3 to:

https://git.libreoffice.org/core/+/14087d3e5fed9b56384432d9aeac608a5e8d86cf

author	Justin Luth <justin_luth@sil.org>	Fri Dec 21 21:22:51 2018 +0300
committer	Justin Luth <justin_luth@sil.org>	Tue Jan 15 19:52:14 2019 +0100

tdf#121670 ooxmlimport: no columns in page styles, only sections
Comment 1 Gabor Kelemen (allotropia) 2022-05-26 09:59:56 UTC
Created attachment 180406 [details]
The example file in Word 2016 and Writer master
Comment 2 Gabor Kelemen (allotropia) 2022-05-26 10:00:56 UTC
Created attachment 180408 [details]
The example file modified: matching Section start property
Comment 3 Gabor Kelemen (allotropia) 2022-05-26 10:02:09 UTC
Created attachment 180409 [details]
Second example in Word and Writer

When the Section start property is the same "New page", the rendering is correct in Writer.
Comment 4 Gabor Kelemen (allotropia) 2022-05-26 10:03:05 UTC
Created attachment 180410 [details]
The example file modified: different Section start property, but no empty para between the breaks
Comment 5 Gabor Kelemen (allotropia) 2022-05-26 10:04:49 UTC
Created attachment 180411 [details]
Third example in Word and Writer

If the Section start properties differ, but there is no empty paragraph between them, the layout is also correct in Writer.
So both the different Section start and the empty paragraph between the page/section breaks is necessary to trigger this error.
Comment 6 Dieter 2022-06-10 08:46:07 UTC
I confirm it with

Version: 7.4.0.0.alpha1+ (x64) / LibreOffice Community
Build ID: 5423dfb8549743bd5045b6e3b1ebad7980e62965
CPU threads: 4; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-GB
Calc: CL

(also compared with Word 2016)
Comment 7 Justin L 2022-07-11 14:53:05 UTC
Interesting/obsolete note: old MS Word 2003 also shows a page break here.

Existing unit test ooxmlexport12 090716_Studentische_Arbeit_VWS.docx is positively affected by the proposed fix. Previously there was an extra CR before "AUFGABENSTELLUNG" on page 3.

UIWriter3's testTdf131963 also hit this change, but was unaffected AFAICS.
Comment 8 Commit Notification 2022-07-12 19:17:58 UTC
Vasily Melenchuk committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/a4ab155ae15e9e6d4deb157634f8b86c87fcbde4

tdf#149313: DOCX import: improved conditions for removeparagraph

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Commit Notification 2022-07-18 10:08:59 UTC
Vasily Melenchuk committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/00d26e055d97706961032416b2a3de4d518d987b

tdf#149313: DOCX import: improved conditions for removeparagraph

It will be available in 7.4.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Dieter 2022-08-06 14:49:17 UTC
VERIFIED with

Version: 7.5.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: a56d0c34716f381accbd9d2e3040a62d3583d18d
CPU threads: 4; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-GB
Calc: CL

Vasily, thanks for fixing it!
Comment 11 Justin L 2023-05-14 01:48:29 UTC
ooxmlexport9's tdf97648_relativeWidth.docx is worse because of this patch. Normally there is a psedo carriage return at the top of the second page.

Because that CR is missing, it triggers export bug 144362.
Comment 12 Justin L 2023-05-15 14:50:54 UTC
Created attachment 187298 [details]
tdf97648_relativeWidth2.docx: modified to add purple color to page/continuous break

(In reply to Justin L from comment #7)
> Interesting/obsolete note: old MS Word 2003 also shows a page break here.

(In reply to Justin L from comment #11)
> ooxmlexport9's tdf97648_relativeWidth.docx is worse because of this patch.
> Normally there is a psedo carriage return at the top of the second page.
> 
> Because that CR is missing, it triggers export bug 144362.

I tested with 2016 and the purple paragraph color shows up in both places. So in the case where a continuous section break and a page break occur in the same paragraph, it looks like bRemove should not delete the paragraph.

The question is whether we actually want to "fix" this. We have a continuous section break (which LO can't handle) connected to a page break in an empty paragraph. That is awfully tempting to just treat as a section page break until LO can handle continuous section breaks.
Comment 13 Justin L 2023-05-15 14:51:46 UTC
Created attachment 187299 [details]
tdf97648_relativeWidth2_2010.pdf: exported by MS Word 2010 - 2016 look the same.