Created attachment 161212 [details] Zip file with two short sample documents Saving a document as .docx causes and extra spurious page break to be inserted on or soon after the 1st page following manual page breaks. By examining the XML produced, I found spurious <w:sectPR>...</w:sectPR> XML that causes the additional page breaks to be inserted. Examining the XML of the source .odt, I was unable to see anything that might lead to an extra page break. The first spurious page break is after the text "letting any other speak who wished to." in the Prologue. The second spurious page break is after the text "Calmly, he returned to studying the orphanage records." The .docx conversion also changes the layout by causing page spills in places they weren't before, but I consider that a minor nuisance. Please see the attached source document WTKDP-BadDocxGen.odt and the bad .docx file generated by Save As from: Version: 6.4.3.2 Build ID: 747b5d0ebf89f41c860ec2a39efd7cb15b54f2d8 CPU threads: 4; OS: Linux 4.4; UI render: default; VCL: x11; Locale: en-GB (en_AU.UTF-8); UI-Language: en-US Calc: threaded
Hmm, this would explain a number of bug reports
It may also be worth noting that the conversion generates more page styles than are used, as well as making unexpected changes. Some examples: 1) All the page sizes are set to the same dimension, 360 x 576 pts (for a 5"x8" book). When converted, all converted page sizes are 360 x 575 pts. 2) Most pages are set with margins Inner, Outer, Top, Bottom: 45 25 24 21 In the .docx these are mostly changed to: 66 36 0 0 3) Pages with footers in the original are set with Spacing 14.2pt and Height 14.2 pt, and dynamic spacing and autofit height off. In the converted .docx, these are changed to Spacing 21.1pt and Height 2.9 pt, and dynamic spacing and autofit height ON. 4) I have a no-header-or-footer style for the first page of a chapter. These pages generally convert across incorrectly and gain a header and footer. However, even the page styles that have no header or footer visible, appear in the Manage Styles with Header and Footer turned ON I don't understand why the conversion doesn't embed enough information so that when writer's .docx is read back by Writer, the page styles are preserved - both the settings, and the names, and the number. In my book, I have just 4 page styles defined and used. After saving to .docx and reopening, not only are all the page styles altered, but there are now 149 plus the default style.
My mistake - in my comment above I referenced the "45 25" margins that are in a separate 4.25" x 7" template. The 5x8 editions do indeed use Inner and Outer margins of 66 and 36, so those margins were correctly preserved. It's the Top and Bottom margins that get altered by saving as .docx.
Also just noting that loading up the .docx version in Writer and deleting the spurious page breaks one by one, and correcting all the page styles, and saving, still produces a .docx file that other programs see as having extra page breaks. The only way to remove them I've found is to edit the XML manually, or with an alternative to LibreOffice such as FreeOffice TextMaker.
The ODT looked the same way as the docx export after: author Justin Luth <justin_luth@sil.org> 2016-06-11 10:30:18 +0300 committer Miklos Vajna <vmiklos@collabora.co.uk> 2016-07-13 07:45:30 +0000 commit 5647eb6f8ef5828ce14973a91946d5d7cdfeae30 (patch) tree f5f28148e23fc378cb19a440bf2fc8a35db812c7 parent 9ad26139b5e8a4df469399378baeac5083f2fcf5 (diff) tdf#76349 writer: treat single-column break as page break Writerfilter imports docx-defined column breaks that exist without being in a column. Word treats these as if they were a page break. Writer basically just preserved and ignored them. I limited the fix to only consider SVX_BREAK_COLUMN_BEFORE since writerfilter is only given “column break” and treats it as column_before. https://cgit.freedesktop.org/libreoffice/core/commit/?id=5647eb6f8ef5828ce14973a91946d5d7cdfeae30
Adding CC to Justin Luth The bibisect is hopefully right.. The ODT looked the same as the docx export for some time during the bibisect..
(In reply to Luke Kendall from comment #2) > It may also be worth noting that the conversion generates more page styles > than are used, as well as making unexpected changes. > > Some examples: > 1) All the page sizes are set to the same dimension, 360 x 576 pts (for a > 5"x8" book). > When converted, all converted page sizes are 360 x 575 pts. > > 2) Most pages are set with margins Inner, Outer, Top, Bottom: 45 25 24 21 > In the .docx these are mostly changed to: 66 36 0 0 > > 3) Pages with footers in the original are set with Spacing 14.2pt and Height > 14.2 pt, and dynamic spacing and autofit height off. > In the converted .docx, these are changed to Spacing 21.1pt and Height > 2.9 pt, and dynamic spacing and autofit height ON. > > 4) I have a no-header-or-footer style for the first page of a chapter. > These pages generally convert across incorrectly and gain a header and > footer. > However, even the page styles that have no header or footer visible, > appear in the Manage Styles with Header and Footer turned ON > > I don't understand why the conversion doesn't embed enough information so > that when writer's .docx is read back by Writer, the page styles are > preserved - both the settings, and the names, and the number. > > In my book, I have just 4 page styles defined and used. After saving to > .docx and reopening, not only are all the page styles altered, but there are > now 149 plus the default style. Thanks for nice description.. Please create a new report for every issue you encounter; a lot of work, but worth it. Every problem has a different cause an needs an different solution; different developers etc.. And QA will lose track if other bugs are hidden under the title page breaks :-)
That sounds like a good, fast fix - thanks! I'm puzzled by the mention of docx-defined column breaks: I provided none in my source .odt file. Perhaps it might be worth checking to see if Writer is also generating spurious empty column breaks when it creates a .docx? Or would that be something I should add to the new bug I just submitted for the other issues? (As suggested by Telesto - thanks!)
(In reply to Luke Kendall from comment #8) > That sounds like a good, fast fix - thanks! Sorry to spoil you're mood.. Didn't fix anything. I only identified the change causing the problem - at least I hope I did - and the person who did the change.. Hopefully he will look at this within a reasonable time frame The change has been made to fix bug 76349, so fixing a unintendedly breaking b
(In reply to Luke Kendall from comment #8) > Perhaps it might be worth checking to see if Writer is also generating > spurious empty column breaks when it creates a .docx? Checking can't hurt, but QA has not enough manpower already. We can't check possibility's. So feel free to investigate if the docx import export filter does such a thing... which would be obviously wrong.. or anything other what's not right between docx and odt - please do report - but QA wants proof.. We need volunteers, so if you want to help.. even if it involves docx import export errors, confirming bugs etc. Your report are quite nice. Including odt and broken docx. Rather helpful compared to only a broken file https://wiki.documentfoundation.org/QA/GetInvolved or more broadly https://wiki.documentfoundation.org/Development/GetInvolved
The first column break I see is on the second page. "Wild Thing" (title paragraph style) is set with a "Break" type column. It serves no purpose in the original. The same is true two empty paragraphs before "Prologue". But that isn't what this bug report is about. It is about paragraphs where there IS NO column break, and yet this still happens (although the earlier bibisect and comment 6 about "ODT looked like this" is referring to the two previous breaks). Bisecting (and a test revert) based on the two reported cases points to LO 6.4 commit 5d1709a7c4184eb31cfc4c2d3acadff3a4a68189 by Author: Miklos Vajna on Mon Oct 28 21:52:18 2019 +0100 tdf#104017 DOC export: be less aggressive with merging page styles CC: Miklos, but just ignore this for a few days - I'm going to research this.
The regression patch was back-ported to LO 6.3.4. I can't believe that this change was made and backported in the first place, let alone that it was never caught by QA's automated testing. Proposed to revert it at gerrit.libreoffice.org/c/core/+/94782 Revert "tdf#104017 DOC export: be less aggressive with merging page styles"
@Justin Thanks for looking into it. Is a new report needed for comment 5.
Re Justin's comment about the first page break being unnecessary: I believe I needed to do that to avoid the title page being given a header and footer when I couldn't change the page style to one with no header or footer. But I'm not certain about that and it seems a minor point. Thanks for looking into this!
(In reply to Telesto from comment #13) > Thanks for looking into it. Is a new report needed for comment 5. Yes. I made bug 133370 - thanks for prompting me.
This will be fixed in LO 6.4.5. The patch from bug 104017 was reverted, and it was marked as a duplicate of bug 48097.
(In reply to Justin L from comment #16) > This will be fixed in LO 6.4.5. > The patch from bug 104017 was reverted, and it was marked as a duplicate of > bug 48097. @Justin, Is it possible to add a unit test something.. preventing this from happening again?
(In reply to Telesto from comment #17) > Is it possible to add a unit test something.. preventing this from happening > again? Already done. A test was added to the revert commit (and was the perfect opportunity to introduce a new ooxmlexport15 all the way back to the 6.4 branch - which will make future backporting of other patches easier).