Bug 108714

Summary: DOCX IMPORT: Page break is missing in a specific document
Product: LibreOffice Reporter: Mike Kaganski <mikekaganski>
Component: WriterAssignee: Mike Kaganski <mikekaganski>
Status: RESOLVED FIXED    
Severity: normal CC: xiscofauli
Priority: medium Keywords: filter:docx
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard: target:6.0.0
Crash report or crash signature: Regression By:
Attachments: A sanitized DOCX that has a page break in Word

Description Mike Kaganski 2017-06-23 11:42:56 UTC
Created attachment 134226 [details]
A sanitized DOCX that has a page break in Word

This document (a sanitized minimal reproducer which shows a problem of a real-life document) has a page break between its two paragraphs when open with Word. LibreOffice doesn't import the page break, showing both paragraphs on one page.

The reason is that LibreOffice rightfully doesn't accept <w:br> element as a child of <w:body>.

ECMA-376-1:2016 17.3.3.1 describes br as element of a run content,
and points to CT_Br in §A.1.
CT_Br may appear only as part of EG_RunInnerContent.
In turn, EG_RunInnerContent may appear only inside CT_R.

So, using <w:br> outside of <w:r> produces ill-formed OOXML.
Open XML SDK 2.5 Productivity Tool for Microsoft Office confirms that,
showing OpenXmlUnknownElement error.
However, Word accepts it as direct child of <w:body>. Another Word bug
that provokes third-parties to create ill-formed real-life documents,
and requires LibreOffice to be bug-to-bug compatible.
Comment 1 Mike Kaganski 2017-06-23 11:52:45 UTC
A patch is submitted: https://gerrit.libreoffice.org/39168
Comment 2 Xisco Faulí 2017-06-23 13:54:20 UTC
Moving to NEW...
Comment 3 Julien Nabet 2017-06-23 19:37:50 UTC
Let's put ASSIGNED status since you assigned yourself :-)
Comment 4 Commit Notification 2017-06-27 13:44:17 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a4a1467bc47b81ad68ecad0d5e2e163670582919

tdf#108714: allow <w:br> as direct child of <w:body>

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 5 Commit Notification 2017-06-28 08:59:30 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=553204015f954d20db65e6adcda68b823a8ef235

tdf#108714 follow-up: handle deferred break in character group

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 Commit Notification 2017-07-07 06:52:17 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f95f0ce163743706a3670c6e33593023c22af2ff

tdf#108714: Also support paragraph-level (line) breaks

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.