Bug 121670 - FILEOPEN DOCX Three type Column with Next Page Section Break created with Word isn’t added to a new page opened in Writer
Summary: FILEOPEN DOCX Three type Column with Next Page Section Break created with Wor...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.3.0.4 release
Hardware: All All
: medium normal
Assignee: Justin L
URL:
Whiteboard: target:6.3.0
Keywords: bibisected, bisected, filter:docx, regression
Depends on:
Blocks: DOCX-Tables
  Show dependency treegraph
 
Reported: 2018-11-23 10:40 UTC by NISZ LibreOffice Team
Modified: 2023-02-13 15:21 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot of the original document side by side in Word and Writer. (158.89 KB, image/png)
2018-11-23 10:41 UTC, NISZ LibreOffice Team
Details
Example file from Word (19.87 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-11-23 10:41 UTC, NISZ LibreOffice Team
Details
3_columns2b.docx: slightly modified example of why this will be difficult to fix. (11.01 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-12-21 15:29 UTC, Justin L
Details
tdf81345RTKDE.docx: round-tripped by KDE Neon 18.04 and LO 6.3.0. (1.78 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-08-29 07:16 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description NISZ LibreOffice Team 2018-11-23 10:40:41 UTC
Description:
Three type Column with Next Page Section Break in DOCX documents created with Microsoft Word 2010 isn’t added to a new page when the document is opened in LibreOffice Writer.

Steps to Reproduce:
    1. Create a new document in Microsoft Word.
    2. On the Page Layout tab, in the Page Setup group, click Columns.
    3. Select Three.
    4. Type “=lorem(1)” and press Enter.
    5. On the Page Layout tab, in the Page Setup group, click Breaks.
    6. Choose Next Page.
    7. Save the file as DOCX.
    8. Open the same file in LibreOffice Writer.
    9. Compare the original file opened in Word and Writer.

Actual Results:
The text layout is incorrect when the document is opened in LibreOffice Writer.

Expected Results:
Columns should have the same layout as the original file when the document is opened in Microsoft Word 2010.


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 NISZ LibreOffice Team 2018-11-23 10:41:00 UTC
Created attachment 146974 [details]
Screenshot of the original document side by side in Word and Writer.
Comment 2 NISZ LibreOffice Team 2018-11-23 10:41:15 UTC
Created attachment 146975 [details]
Example file from Word
Comment 3 raal 2018-11-23 20:52:31 UTC
This seems to have begun at the below commit.
Adding Cc: to Miklos Vajna ; Could you possibly take a look at this one?
Thanks
 9deeb4e97ee68af6de30592a056aa88f7fc1a7bd is the first bad commit
commit 9deeb4e97ee68af6de30592a056aa88f7fc1a7bd
Author: Matthew Francis <mjay.francis@gmail.com>
Date:   Sat Sep 5 21:13:03 2015 +0800

    source-hash-4e653d15eff26aa5283d8ba20611893f4c573f57
    
    commit 4e653d15eff26aa5283d8ba20611893f4c573f57
    Author:     Miklos Vajna <vmiklos@collabora.co.uk>
    AuthorDate: Wed Sep 11 15:18:26 2013 +0200
    Commit:     Miklos Vajna <vmiklos@collabora.co.uk>
    CommitDate: Wed Sep 11 16:14:16 2013 +0200
    
        DOCX import: fix default section break type inside multiple columns
Comment 4 Justin L 2018-12-21 15:24:05 UTC
yuck: PropertyMap.cxx's bTreatAsContinuous is true in this case, but it shouldn't be. Amazingly, if I remove all of the qualifications, no unit tests fail anymore.

So the root of the problem is that because it is bTreatAsContinuous, it misses:
commit ede1a83e110ce7bc7d3560f415d6269ea3feb947
Author: Justin Luth 
Date:   Sat Dec 10 09:35:09 2016 +0300

    tdf#46941 docx: don't balance columns before page-break-section

This type of problem has had extreme regression tendencies, so I am over-documenting everything here...

Some relevant commits from history to review are:
commit afc96d263959d10e457b54a574f0829d20e99df4
Author: Justin Luth 
Date:   Tue Nov 7 09:29:30 2017 +0300

    tdf#112352 ooxmlimport: ALWAYS treat 1st nextpage w/cols as cont
    
    fix 5.4 regression from 4605bd46984125a99b0e993b71efa6edb411699f.
 
[**** a key cautionary note here.  How accurate I was remains to be seen...]
    When there are columns, if a nextpage section doesn't contain any
    other "page style" details we treat it as a continuous break,
    If we don't, the column info becomes part of the style itself,
    and not just a section property.
    
    However, the very first section is troublesome - by definition it DOES
    contain page style details, and so if the document starts with columns,
    the default style would gain the column attribute. Usually that
    results in a mess, so lets make sure that we avoid that also in
    the case where headers/footers are defined.

The following commit might be the biggest reason why this now seems to work:
commit 4605bd46984125a99b0e993b71efa6edb411699f
Author: Justin Luth 
Date:   Thu Mar 9 18:13:50 2017 +0300

    tdf#103931 writerfilter breaktype: same for implicit and explicit
    
    MSWord normally does NOT specify "nextPage" for the sectionBreak,
    since that is the default type. That is imported as BreakType == -1.
    However, Writer ALWAYS exports the section type name, which of
    course is imported explicitly.
    **There is an import hack that treats the very first -1 section as
    continuous IF there are columns**. Since Writer explicitly defines
    the section type, these documents import differently.
    
    When Writer round-trips these types of files, they get totally
    messed up in Writer, although they look fine in Word. So, treat
    both implicit and explicit nextPage identically for
    bTreatAsContinuous during import.

[**** and a proper fix for this part is just landing in LO 6.3 *****]
    Another unit test demonstrated that headers/footers are lost when
    treating as continuous, so preventing that situation now also.

and finally the original fix to the same commit mentioned in comment 3
commit 3870c0555aa461268a6d056543f4545d562769ce
Author: Justin Luth 
Date:   Wed Sep 7 19:26:30 2016 +0300

    tdf#81345 docx import fix default page break regression
    
    "regression" from 4e653d15eff26aa5283d8ba20611893f4c573f57
    
    If there are new style elements, then don't treat a
    default break in columns as a continuous break.
    
    This fixes both round-tripping, and initial import of
    columns and headers on this particular document. Since
    MS and LO treat sections so differently, it is a balancing
    act of what to change.
Comment 5 Justin L 2018-12-21 15:29:31 UTC
Created attachment 147759 [details]
3_columns2b.docx: slightly modified example of why this will be difficult to fix.

If we just change const bool bTreatAsContinuous = m_nBreakType == NS_ooxml::LN_Value_ST_SectionMark_nextPage
        && m_nColumnCount > 0
        && m_bIsFirstSection

then the original document looks fine, but this one becomes awful. Perhaps the only way to handle this is to ONLY put column information into a section and never a page style (which likely will create exporting problems...).
Comment 6 Justin L 2018-12-22 04:25:36 UTC
(In reply to Justin L from comment #5)
> The only way to handle this is to ONLY put column information into a section
> and never a page style (which likely will create exporting problems...).
True (and true):  https://gerrit.libreoffice.org/#/c/65557/
Comment 7 Commit Notification 2019-01-15 18:53:38 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/14087d3e5fed9b56384432d9aeac608a5e8d86cf%5E%21

tdf#121670 ooxmlimport: no columns in page styles, only sections

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Justin L 2019-01-15 18:59:20 UTC
LIKELY TO EXPOSE SECTION EXPORT/IMPORT PROBLEMS. That is already happening somewhat because support for forms/protected sections was added in LO6.2. By making this change in 6.3, it will help to expose problems faster, with the hope that they can still be fixed for 6.2.

Therefore, I hope to re-evaluate this at the end of the 6.3 and determine whether I'm comfortable leaving it in place, or whether it is too risky and should be reverted. I've solved about 6 different section bugs in the week prior to pushing this commit...
Comment 9 Gabor Kelemen (allotropia) 2019-01-15 20:21:45 UTC
(In reply to Justin L from comment #8)
> LIKELY TO EXPOSE SECTION EXPORT/IMPORT PROBLEMS. That is already happening
> somewhat because support for forms/protected sections was added in LO6.2. By
> making this change in 6.3, it will help to expose problems faster, with the
> hope that they can still be fixed for 6.2.
> 

Many thanks for working on these!

About exposing problems, don't worry: my team has about a thousand user-made documents on which we recently started to run bi-weekly regression tests. 
I'm confident we gonna find & report any possible issues ;).
Comment 10 Xisco Faulí 2019-01-16 15:32:56 UTC
Verified in

Version: 6.3.0.0.alpha0+
Build ID: 1bf68dbf53f4b5308e295058226abd6d6fb49c3d
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

@Justin Luth, thanks for fixing this!!

no plan to backport it based on comment 8
Comment 11 Luke 2019-08-28 01:41:50 UTC
Justin,

https://cgit.freedesktop.org/libreoffice/core/commit/?id=14087d3e5fed
    tdf#121670 ooxmlimport: no columns in page styles, only sections

Is causing a clean install of KDE Neon, and Gandalf to fail with

ooxmlexport4.cxx:1209:Assertion
Test name: testTdf81345_045Original::Import_Export_Import
assertion failed
- Expression: pageStyleName != "Standard"

https://ci.libreoffice.org/job/lo_daily_update_gandalf/lastBuild/console
Comment 12 Justin L 2019-08-28 05:37:29 UTC
(In reply to Luke from comment #11)
Can you attach a copy of the round-tripped file here?  Is the second page really using the default page style instead of converted1?
Comment 13 Justin L 2019-08-29 07:16:23 UTC
Created attachment 153728 [details]
tdf81345RTKDE.docx: round-tripped by KDE Neon 18.04 and LO 6.3.0.

(In reply to Justin L from comment #12)
> Is the second page really using the default page style instead of converted1?
Yes, in KDE neon, for some reason the first page spills over to the second (almost empty) page on a round-tripped file.

I modified the unit test to avoid checking this errant page 2. https://gerrit.libreoffice.org/78250

It seems like the problem comes from KDE export.  Exporting from regular Ubuntu is loaded fine in KDE.  Regular Ubuntu can also load the KDE round-tripped file though, so something odd is happening.
Comment 14 Justin L 2019-09-02 11:17:02 UTC
(In reply to Luke from comment #11)
The main culprit seems to come from the difference in system default fonts (fc-match), so a different fallback for "Franklin Gothic" is used. Neon uses "NotoSans-Regular.ttf: "Noto Sans" "Regular" while Ubuntu uses "DejaVuSans.ttf: "DejaVu Sans" "Book".

Since this is just a font issue (and the text size/spacing is significantly different) I'm not going to pursue this further.