Bug 144013 - FILESAVE ODT->DOCX: NotPlausibleSingleWordSection First Page header also on second page (only when document contains a table on the page which spreads across two pages)
Summary: FILESAVE ODT->DOCX: NotPlausibleSingleWordSection First Page header also on s...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.1.0.4 release
Hardware: All All
: lowest minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected
Depends on:
Blocks: DOCX-Header-Footer
  Show dependency treegraph
 
Reported: 2021-08-23 02:29 UTC by Kevin Suo
Modified: 2023-06-14 18:58 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
test odt file (56.13 KB, application/vnd.oasis.opendocument.text)
2021-08-23 02:29 UTC, Kevin Suo
Details
144013_exaggerate.odt: demo of how terrible the "working" hack really is. (60.25 KB, application/vnd.oasis.opendocument.text)
2021-10-13 10:52 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Suo 2021-08-23 02:29:28 UTC
Created attachment 174484 [details]
test odt file

Steps to Reproduce:

1. Open the attached odt file.
Observe that it has "same content on the first page" unchecked in page properties, so that the header logo is only shown on the first page but not on the 2nd page.
Also observe that there is a table which spread across the first and the 2nd page.

2. Save as docx, then open the saved docx file with Writer or MS Word.
--> Bug: the header logo appears on both the 1st and the 2nd page. It should appear only on the first page.

3. Try to delete some table rows in the odt file so that the table fit only on the 1st page, then save as docx and reopen with Writer or MS Word.
--> OK, as expected.

Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: cb2827f5f65324f309fa0e3c30d0b19ad237410e
CPU threads: 8; OS: Linux 5.13; UI render: default; VCL: gtk3
Locale: zh-CN (zh_CN.UTF-8); UI: zh-CN
Build Platform: Fedora34@X64, Branch:master, bibisect-linux-64-7.3-CN
Calc: threaded

also on 7.2 branch.
Comment 1 Kevin Suo 2021-08-23 09:22:16 UTC
Bug alreay in
Version: 7.0.0.0.alpha1+
Build ID: 574c57090642347980d2395e1e183cc7b5c171ad
Comment 2 Dieter 2021-09-07 06:26:18 UTC
I confirm it with

Version: 7.3.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 74d35e143d557a7e65c4443f5b80cb9d406b1fa1
CPU threads: 4; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-GB
Calc: CL

The problem occurs, because after saving as docx in step two the option "same content on first page" is active.

Special case of bug 136472
Comment 3 Justin L 2021-10-07 18:47:51 UTC
This was working OK until LO 4.1.
Linux bibisect-41max bisect commit d817cf67bff357d8a2425dda57f12475fb0a9f68
    commit 532e25f8b0ef1daeca1f9f84c7084812b72841d5
    Author:     Luke Deller on Mon Feb 11 10:26:34 2013 +0000
          export different first page header/footer to doc/docx
Comment 4 Justin L 2021-10-08 13:10:23 UTC
Page styles are always complicated because MS has a completely different paradigm than LO, so everything is based on emulation. Any monkeying around here likely has some bad side effects.


In this case, LO ODT has a First-Page style that defines the same header for all pages. However, by definition it can only apply to the first page because it rolls over to another style - a concept that MS does not have at all. (So one way OP can work around this is to NOT use the First Page style, but just use the Default style with a different first header.)

A complication here is that the First Page has different margins from the next pages, so IsPlausableSingleWordSection is false. So we can't combine these two styles into a single one. (If they were identical, then it would have worked, so that is another way OP could get the desired results.)

[In DOC format, they fudge adding in a section page break in this case, but that also adds all kinds of issues (like breaking a single paragraph into two), so it is not a good idea to try to port that hack to DOCX. See for instance the (poorly titled) bug 132149.]

There were a lot of page style changes happening in 4.0 and 4.1 it seems like, so what may seem "correct" for this particular document is likely just a coincidence. For example, the margins on page 2 were not correct - they more closely match the settings from the first page.

So, I'm removing the REGRESSION flag.
Comment 5 Justin L 2021-10-08 18:16:16 UTC
How to fix? I don't think the computer can do that.
1.) If the follow style doesn't have a separate first header/footer, then the first section's header/footer could be moved to the follow style - but that would lose any special margins on the first page.

2.) The two page styles (aka sections) could remain separate, but in that case we have no ability to know when to apply the second section (since there is no explicit page break). [I believe this is the current implementation.]

[One complication here is that DOCX import STILL uses First Page style instead of "different content on first" (bug 136472), so care is needed not to break the import side, because we don't know if the First Page usage comes from import or from user design.]


AFAIK, at present, the First Page settings will apply to the rest of the document. Whether that is appropriate depends on the document itself. [Note that "First Page" is simply an implementation of a generic first-follow, so unless the built-in "First Page" style is treated as a special edge case, we can't make assumptions about how the first-follow style should work. I suppose a case could be made that in these cases the FOLLOW style's settings should take priority. I believe at present the FIRST style's settings are the ones being applied.]


I think my personal take is that we should NOT try to improve the current experience. Since this is all guesswork, a poor response is best since then the user is more clearly alerted that they need to play with the settings.

Nevertheless here is something that does just that - implementing option 1. http://gerrit.libreoffice.org/c/core/+/123271 tdf#144013 ms export: prefer follow page-style over first
Comment 6 Justin L 2021-10-13 10:41:51 UTC
(In reply to Justin L from comment #4)
> [In DOC format, they fudge adding in a section page break in this case
Actually, it isn't just in DOC format. It also applies to DOCX, and WOULD in this case except we are IsInTable() at the point of the soft-break, so it skips it.

bool bNeedParaSplit = NeedTextNodeSplit( rNode, softBreakList ) && !IsInTable();

Of course, all of this was in OP's description, but I didn't make the connection.

This fudging is actually a horrible hack that tends to seem to work OK visually, but foundationally it is terrible since it tends to hack one paragraph into two.  So you can easily have a paragraph end/start mid-sentence etc., and certain paragraph formatting can really showcase how bad it is. So there is no point in trying to "extend" this hack to also work with tables.
Comment 7 Justin L 2021-10-13 10:52:58 UTC
Created attachment 175714 [details]
144013_exaggerate.odt: demo of how terrible the "working" hack really is.
Comment 8 Luke 2021-10-23 19:09:18 UTC
Justin,
I like your idea of warning the user when LO's features don't map one-to-one to MSO's. It's a good stop-gap until this is fixed. What you think of adding support to LO's layout engine to support a Page Styles mode that matches MSO's? Should I file an enhancement request for either one or both.
Comment 9 Justin L 2021-10-23 19:13:44 UTC
I don't think there is any point in filing a bug about it. The scope is huge, so it will never get handled by a volunteer, only by a company. And in that case you don't need a bug report, you need money.