Bug 48353 - FILESAVE: DOCX - Page breaks never get really deleted on roundtrip
Summary: FILESAVE: DOCX - Page breaks never get really deleted on roundtrip
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.2 release
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: INTEROPERABILITY
Keywords:
Depends on:
Blocks: DOCX
  Show dependency treegraph
 
Reported: 2012-04-05 13:31 UTC by Joachim Otahal
Modified: 2017-05-14 01:16 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Fresh document with page breaks. (8.65 KB, application/vnd.oasis.opendocument.text)
2012-04-05 13:32 UTC, Joachim Otahal
Details
Now I removed the page breaks and saved. (8.82 KB, application/vnd.oasis.opendocument.text)
2012-04-05 13:33 UTC, Joachim Otahal
Details
Opened Page-Break2.odt and save as Page-break3.docx (3.44 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-04-05 13:34 UTC, Joachim Otahal
Details
Open page-break3.docx and save as page-break4.odt (8.79 KB, application/vnd.oasis.opendocument.text)
2012-04-05 13:36 UTC, Joachim Otahal
Details
Ignore previous PageBreak4.odt - this is the one after the merry-go-round through Word 2010. (9.55 KB, application/vnd.oasis.opendocument.text)
2012-04-05 14:07 UTC, Joachim Otahal
Details
Retesting page break, for comment https://bugs.documentfoundation.org/show_bug.cgi?id=48353#c11 (33.77 KB, application/x-zip)
2015-05-03 07:28 UTC, Joachim Otahal
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joachim Otahal 2012-04-05 13:31:01 UTC
Page breaks never get really deleted, bug shows when saving as docx.

The EXACT way to reproduce:
new document, type "1" in the line and hit CR, then add a manual page break.
type "2" in a line and hit CR, add a manual page break.
Repeat until you reach 5.
Save the document as .odt.
Close LO.
Open the document.
Remove the page breaks (using the symbol which appears between the pages when you hover there).
Save as .odt.
Close LO.
Open the .odt, everything looks fine.
Do "file" "save as" "microsoft office 2007/2010 docx".
Close LO.
Open the .docx you just saved.
Your page breaks which you removed before saving as .odt are back again.
Comment 1 Joachim Otahal 2012-04-05 13:32:31 UTC
Created attachment 59535 [details]
Fresh document with page breaks.
Comment 2 Joachim Otahal 2012-04-05 13:33:42 UTC
Created attachment 59536 [details]
Now I removed the page breaks and saved.
Comment 3 Joachim Otahal 2012-04-05 13:34:50 UTC
Created attachment 59537 [details]
Opened Page-Break2.odt and save as Page-break3.docx

This is where the deleted page breaks reappear which I deleted in Page-Break2.odt
Comment 4 Joachim Otahal 2012-04-05 13:36:11 UTC
Created attachment 59538 [details]
Open page-break3.docx and save as page-break4.odt

The removed and reappeared page breaks stay.
Comment 5 Joachim Otahal 2012-04-05 13:37:34 UTC
The only way I found to really remove the page breaks:
Load the docx. in office 2010, remove page break, save.
Open in LO, and finally they are gone and stay gone, no matter whether you save as .docx or .odt.
Comment 6 Joachim Otahal 2012-04-05 14:05:32 UTC
Here are the relevant source content.xml parts:

The relevant parts extracted from the odt..

This is the one with the normal five page breaks (Page-Break1.odt/content.xml):

<office:automatic-styles>
<style:style style:name="P1" style:family="paragraph" style:parent-style-name="Standard">
<style:paragraph-properties fo:break-before="page"/>
</style:style>
</office:automatic-styles>
<office:body>
<office:text text:use-soft-page-breaks="true">
<text:sequence-decls>
<text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>
<text:sequence-decl text:display-outline-level="0" text:name="Table"/>
<text:sequence-decl text:display-outline-level="0" text:name="Text"/>
<text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
</text:sequence-decls>
<text:p text:style-name="Standard">1</text:p>
<text:p text:style-name="Standard"/>
<text:p text:style-name="P1">2</text:p>
<text:p text:style-name="Standard"/>
<text:p text:style-name="P1">3</text:p>
<text:p text:style-name="Standard"/>
<text:p text:style-name="P1">4</text:p>
<text:p text:style-name="Standard"/>
<text:p text:style-name="P1">5</text:p>
<text:p text:style-name="Standard"/>
</office:text>
</office:body>
</office:document-content>


This is the one with the page breaks "removed", or rather hidden (why set to auto?) Page-Break2.odt/content.xml:

<office:automatic-styles>
<style:style style:name="P1" style:family="paragraph" style:parent-style-name="Standard" style:master-page-name="">
<style:paragraph-properties style:page-number="auto" fo:break-before="auto" fo:break-after="auto"/>
</style:style>
<style:style style:name="P2" style:family="paragraph" style:parent-style-name="Standard" style:master-page-name="">
<style:paragraph-properties style:page-number="auto" fo:break-before="auto" fo:break-after="auto"/>
</style:style>
<style:style style:name="P3" style:family="paragraph" style:parent-style-name="Standard" style:master-page-name="">
<style:paragraph-properties style:page-number="auto" fo:break-before="auto" fo:break-after="auto"/>
</style:style>
<style:style style:name="P4" style:family="paragraph" style:parent-style-name="Standard" style:master-page-name="">
<style:paragraph-properties style:page-number="auto" fo:break-before="auto" fo:break-after="auto"/>
</style:style>
</office:automatic-styles>
<office:body>
<office:text>
<text:sequence-decls>
<text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>
<text:sequence-decl text:display-outline-level="0" text:name="Table"/>
<text:sequence-decl text:display-outline-level="0" text:name="Text"/>
<text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
</text:sequence-decls>
<text:p text:style-name="Standard">1</text:p>
<text:p text:style-name="Standard"/>
<text:p text:style-name="P1">2</text:p>
<text:p text:style-name="Standard"/>
<text:p text:style-name="P1">3</text:p>
<text:p text:style-name="Standard"/>
<text:p text:style-name="P1">4</text:p>
<text:p text:style-name="Standard"/>
<text:p text:style-name="P1">5</text:p>
<text:p text:style-name="Standard"/>
</office:text>
</office:body>
</office:document-content>


This is the way after the merry go round through word 2010, remove the page breaks and re-save it in LO as .odt, now the page breaks are REALLY removed Page-Break4.odt/content.xml:

<office:automatic-styles>
<style:style style:name="P1" style:family="paragraph" style:parent-style-name="Standard">
<style:paragraph-properties fo:margin-top="0cm" fo:margin-bottom="0cm"/>
</style:style>
<style:style style:name="P2" style:family="paragraph" style:parent-style-name="Standard" style:master-page-name="Standard">
<style:paragraph-properties fo:margin-top="0cm" fo:margin-bottom="0cm" style:page-number="auto"/>
</style:style>
</office:automatic-styles>
<office:body>
<office:text>
<text:sequence-decls>
<text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>
<text:sequence-decl text:display-outline-level="0" text:name="Table"/>
<text:sequence-decl text:display-outline-level="0" text:name="Text"/>
<text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
</text:sequence-decls>
<text:p text:style-name="P2">
<text:bookmark-start text:name="_GoBack"/>1</text:p>
<text:p text:style-name="P1">2</text:p>
<text:p text:style-name="P1">3</text:p>
<text:p text:style-name="P1">4</text:p>
<text:p text:style-name="P1">5</text:p>
<text:p text:style-name="P1">
<text:bookmark-end text:name="_GoBack"/>
</text:p>
</office:text>
</office:body>
</office:document-content>
Comment 7 Joachim Otahal 2012-04-05 14:07:08 UTC
Created attachment 59540 [details]
Ignore previous PageBreak4.odt - this is the one after the merry-go-round through Word 2010.

Ignore previous PageBreak4.odt - this is the one after the merry-go-round through Word 2010.
Now the Page Breaks are really gone.
Comment 8 leighman 2012-08-06 20:18:43 UTC
Thanks for your bug report.
I can confirm the issue by following your instructions so I'm setting the status to NEW so a developer can begin working on the issue.

Confirmed Ubuntu 12.04, LibO 3.5.4
Comment 9 Jorendc 2014-02-15 09:40:05 UTC
Still reproducible by following the (nice) description :-), tested using Mac OSX 10.9 with LibreOffice Version: 4.3.0.0.alpha0+
Build ID: b540f9172814f51361cf31d2a4b03e34d1d375ef
TinderBox: MacOSX-x86@49-TDF, Branch:master, Time: 2014-02-15_00:28:42

Kind regards,
Joren
Comment 10 Joel Madero 2015-05-02 15:41:59 UTC Comment hidden (obsolete)
Comment 11 Joachim Otahal 2015-05-03 07:25:38 UTC
As far as I can see it is kind of fixed in LibreOffice 4.4.2.

If is fixed as long as far as .docx is concerned. When checking the .docx documents version after the "reproduce this way" steps it is cleanly removed and not there, so it is indeed fixed on the "save as .docx" layer.
(Step 1 to 5 documents)

When checking the .odt versions of the document Libreoffice never cleans up the unused "style:name="P2" styles setting in the content.xml (step 4). It should though.
However it will save it much cleaner when using the cleaned up "Step 5.docx" and save it as .odt again (Step 6).

So it is fixed as far as the .docx bug initially reported is concerned.
Is it worth open another bug for "Libeoffice should not save orphaned style:name="Px" styles, making the document smaller" bug?
Comment 12 Joachim Otahal 2015-05-03 07:28:10 UTC
Created attachment 115282 [details]
Retesting page break, for comment https://bugs.documentfoundation.org/show_bug.cgi?id=48353#c11

Retesting page break, for comment https://bugs.documentfoundation.org/show_bug.cgi?id=48353#c11