Description: Three type Column with Next Page Section Break in DOCX documents created with Microsoft Word 2010 isn’t added to a new page when the document is opened in LibreOffice Writer. Steps to Reproduce: 1. Create a new document in Microsoft Word. 2. On the Page Layout tab, in the Page Setup group, click Columns. 3. Select Three. 4. Type “=lorem(1)” and press Enter. 5. On the Page Layout tab, in the Page Setup group, click Breaks. 6. Choose Next Page. 7. Save the file as DOCX. 8. Open the same file in LibreOffice Writer. 9. Compare the original file opened in Word and Writer. Actual Results: The text layout is incorrect when the document is opened in LibreOffice Writer. Expected Results: Columns should have the same layout as the original file when the document is opened in Microsoft Word 2010. Reproducible: Always User Profile Reset: No Additional Info:
Created attachment 146974 [details] Screenshot of the original document side by side in Word and Writer.
Created attachment 146975 [details] Example file from Word
This seems to have begun at the below commit. Adding Cc: to Miklos Vajna ; Could you possibly take a look at this one? Thanks 9deeb4e97ee68af6de30592a056aa88f7fc1a7bd is the first bad commit commit 9deeb4e97ee68af6de30592a056aa88f7fc1a7bd Author: Matthew Francis <mjay.francis@gmail.com> Date: Sat Sep 5 21:13:03 2015 +0800 source-hash-4e653d15eff26aa5283d8ba20611893f4c573f57 commit 4e653d15eff26aa5283d8ba20611893f4c573f57 Author: Miklos Vajna <vmiklos@collabora.co.uk> AuthorDate: Wed Sep 11 15:18:26 2013 +0200 Commit: Miklos Vajna <vmiklos@collabora.co.uk> CommitDate: Wed Sep 11 16:14:16 2013 +0200 DOCX import: fix default section break type inside multiple columns
yuck: PropertyMap.cxx's bTreatAsContinuous is true in this case, but it shouldn't be. Amazingly, if I remove all of the qualifications, no unit tests fail anymore. So the root of the problem is that because it is bTreatAsContinuous, it misses: commit ede1a83e110ce7bc7d3560f415d6269ea3feb947 Author: Justin Luth Date: Sat Dec 10 09:35:09 2016 +0300 tdf#46941 docx: don't balance columns before page-break-section This type of problem has had extreme regression tendencies, so I am over-documenting everything here... Some relevant commits from history to review are: commit afc96d263959d10e457b54a574f0829d20e99df4 Author: Justin Luth Date: Tue Nov 7 09:29:30 2017 +0300 tdf#112352 ooxmlimport: ALWAYS treat 1st nextpage w/cols as cont fix 5.4 regression from 4605bd46984125a99b0e993b71efa6edb411699f. [**** a key cautionary note here. How accurate I was remains to be seen...] When there are columns, if a nextpage section doesn't contain any other "page style" details we treat it as a continuous break, If we don't, the column info becomes part of the style itself, and not just a section property. However, the very first section is troublesome - by definition it DOES contain page style details, and so if the document starts with columns, the default style would gain the column attribute. Usually that results in a mess, so lets make sure that we avoid that also in the case where headers/footers are defined. The following commit might be the biggest reason why this now seems to work: commit 4605bd46984125a99b0e993b71efa6edb411699f Author: Justin Luth Date: Thu Mar 9 18:13:50 2017 +0300 tdf#103931 writerfilter breaktype: same for implicit and explicit MSWord normally does NOT specify "nextPage" for the sectionBreak, since that is the default type. That is imported as BreakType == -1. However, Writer ALWAYS exports the section type name, which of course is imported explicitly. **There is an import hack that treats the very first -1 section as continuous IF there are columns**. Since Writer explicitly defines the section type, these documents import differently. When Writer round-trips these types of files, they get totally messed up in Writer, although they look fine in Word. So, treat both implicit and explicit nextPage identically for bTreatAsContinuous during import. [**** and a proper fix for this part is just landing in LO 6.3 *****] Another unit test demonstrated that headers/footers are lost when treating as continuous, so preventing that situation now also. and finally the original fix to the same commit mentioned in comment 3 commit 3870c0555aa461268a6d056543f4545d562769ce Author: Justin Luth Date: Wed Sep 7 19:26:30 2016 +0300 tdf#81345 docx import fix default page break regression "regression" from 4e653d15eff26aa5283d8ba20611893f4c573f57 If there are new style elements, then don't treat a default break in columns as a continuous break. This fixes both round-tripping, and initial import of columns and headers on this particular document. Since MS and LO treat sections so differently, it is a balancing act of what to change.
Created attachment 147759 [details] 3_columns2b.docx: slightly modified example of why this will be difficult to fix. If we just change const bool bTreatAsContinuous = m_nBreakType == NS_ooxml::LN_Value_ST_SectionMark_nextPage && m_nColumnCount > 0 && m_bIsFirstSection then the original document looks fine, but this one becomes awful. Perhaps the only way to handle this is to ONLY put column information into a section and never a page style (which likely will create exporting problems...).
(In reply to Justin L from comment #5) > The only way to handle this is to ONLY put column information into a section > and never a page style (which likely will create exporting problems...). True (and true): https://gerrit.libreoffice.org/#/c/65557/
Justin Luth committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/+/14087d3e5fed9b56384432d9aeac608a5e8d86cf%5E%21 tdf#121670 ooxmlimport: no columns in page styles, only sections It will be available in 6.3.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
LIKELY TO EXPOSE SECTION EXPORT/IMPORT PROBLEMS. That is already happening somewhat because support for forms/protected sections was added in LO6.2. By making this change in 6.3, it will help to expose problems faster, with the hope that they can still be fixed for 6.2. Therefore, I hope to re-evaluate this at the end of the 6.3 and determine whether I'm comfortable leaving it in place, or whether it is too risky and should be reverted. I've solved about 6 different section bugs in the week prior to pushing this commit...
(In reply to Justin L from comment #8) > LIKELY TO EXPOSE SECTION EXPORT/IMPORT PROBLEMS. That is already happening > somewhat because support for forms/protected sections was added in LO6.2. By > making this change in 6.3, it will help to expose problems faster, with the > hope that they can still be fixed for 6.2. > Many thanks for working on these! About exposing problems, don't worry: my team has about a thousand user-made documents on which we recently started to run bi-weekly regression tests. I'm confident we gonna find & report any possible issues ;).
Verified in Version: 6.3.0.0.alpha0+ Build ID: 1bf68dbf53f4b5308e295058226abd6d6fb49c3d CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US Calc: threaded @Justin Luth, thanks for fixing this!! no plan to backport it based on comment 8
Justin, https://cgit.freedesktop.org/libreoffice/core/commit/?id=14087d3e5fed tdf#121670 ooxmlimport: no columns in page styles, only sections Is causing a clean install of KDE Neon, and Gandalf to fail with ooxmlexport4.cxx:1209:Assertion Test name: testTdf81345_045Original::Import_Export_Import assertion failed - Expression: pageStyleName != "Standard" https://ci.libreoffice.org/job/lo_daily_update_gandalf/lastBuild/console
(In reply to Luke from comment #11) Can you attach a copy of the round-tripped file here? Is the second page really using the default page style instead of converted1?
Created attachment 153728 [details] tdf81345RTKDE.docx: round-tripped by KDE Neon 18.04 and LO 6.3.0. (In reply to Justin L from comment #12) > Is the second page really using the default page style instead of converted1? Yes, in KDE neon, for some reason the first page spills over to the second (almost empty) page on a round-tripped file. I modified the unit test to avoid checking this errant page 2. https://gerrit.libreoffice.org/78250 It seems like the problem comes from KDE export. Exporting from regular Ubuntu is loaded fine in KDE. Regular Ubuntu can also load the KDE round-tripped file though, so something odd is happening.
(In reply to Luke from comment #11) The main culprit seems to come from the difference in system default fonts (fc-match), so a different fallback for "Franklin Gothic" is used. Neon uses "NotoSans-Regular.ttf: "Noto Sans" "Regular" while Ubuntu uses "DejaVuSans.ttf: "DejaVu Sans" "Book". Since this is just a font issue (and the text size/spacing is significantly different) I'm not going to pursue this further.