Bug 108714 - DOCX IMPORT: Page break is missing in a specific document
Summary: DOCX IMPORT: Page break is missing in a specific document
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Mike Kaganski
URL:
Whiteboard: target:6.0.0
Keywords: filter:docx
Depends on:
Blocks:
 
Reported: 2017-06-23 11:42 UTC by Mike Kaganski
Modified: 2017-07-07 06:52 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
A sanitized DOCX that has a page break in Word (1.26 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-06-23 11:42 UTC, Mike Kaganski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2017-06-23 11:42:56 UTC
Created attachment 134226 [details]
A sanitized DOCX that has a page break in Word

This document (a sanitized minimal reproducer which shows a problem of a real-life document) has a page break between its two paragraphs when open with Word. LibreOffice doesn't import the page break, showing both paragraphs on one page.

The reason is that LibreOffice rightfully doesn't accept <w:br> element as a child of <w:body>.

ECMA-376-1:2016 17.3.3.1 describes br as element of a run content,
and points to CT_Br in §A.1.
CT_Br may appear only as part of EG_RunInnerContent.
In turn, EG_RunInnerContent may appear only inside CT_R.

So, using <w:br> outside of <w:r> produces ill-formed OOXML.
Open XML SDK 2.5 Productivity Tool for Microsoft Office confirms that,
showing OpenXmlUnknownElement error.
However, Word accepts it as direct child of <w:body>. Another Word bug
that provokes third-parties to create ill-formed real-life documents,
and requires LibreOffice to be bug-to-bug compatible.
Comment 1 Mike Kaganski 2017-06-23 11:52:45 UTC
A patch is submitted: https://gerrit.libreoffice.org/39168
Comment 2 Xisco Faulí 2017-06-23 13:54:20 UTC
Moving to NEW...
Comment 3 Julien Nabet 2017-06-23 19:37:50 UTC
Let's put ASSIGNED status since you assigned yourself :-)
Comment 4 Commit Notification 2017-06-27 13:44:17 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a4a1467bc47b81ad68ecad0d5e2e163670582919

tdf#108714: allow <w:br> as direct child of <w:body>

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 5 Commit Notification 2017-06-28 08:59:30 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=553204015f954d20db65e6adcda68b823a8ef235

tdf#108714 follow-up: handle deferred break in character group

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 Commit Notification 2017-07-07 06:52:17 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f95f0ce163743706a3670c6e33593023c22af2ff

tdf#108714: Also support paragraph-level (line) breaks

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.