Bug 122823 - WRITER: Shows multiple footers and no page breaks for this file, unlike MS Word
Summary: WRITER: Shows multiple footers and no page breaks for this file, unlike MS Word
Status: RESOLVED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.0.0.0.alpha1+
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, regression
Depends on:
Blocks: DOCX-Page
  Show dependency treegraph
 
Reported: 2019-01-20 00:19 UTC by M. A. Sridhar
Modified: 2021-03-19 11:54 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
The .docx file exhibiting the footer anomaly (5.71 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-01-20 00:19 UTC, M. A. Sridhar
Details

Note You need to log in before you can comment on or make changes to this bug.
Description M. A. Sridhar 2019-01-20 00:19:01 UTC
Created attachment 148445 [details]
The .docx file exhibiting the footer anomaly

The attached file is expected to have three pages and one header and footer per page. But it shows multiple footers and no page breaks when displayed under LibreOffice Writer 6.1. However, it shows three pages and one footer per page as expected under LibreOffice Writer version 5.1. Microsoft Word and Google Docs also correctly show the three pages.
Comment 1 Dieter 2019-01-20 13:13:30 UTC
I confirm this

3 pages in
Version: 5.4.7.2 (x64)
Build-ID: c838ef25c16710f8838b1faec480ebba495259d0
CPU-Threads: 4; BS: Windows 6.19; UI-Render: GL; 
Gebietsschema: de-DE (de_DE); Calc: CL

1 page in
Version: 6.1.4.2 (x64)
Build-ID: 9d0f32d1f0b509096fd65e0d4bec26ddd1938fd3
CPU-Threads: 4; BS: Windows 10.0; UI-Render: Standard; 
Gebietsschema: de-DE (de_DE); Calc: group threaded

and
Version: 6.3.0.0.alpha0+ (x64)
Build ID: 411f3a050ac2be598019d512f8ccfe041080c28f
CPU threads: 4; OS: Windows 10.0; UI render: default; VCL: win; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2019-01-14_03:17:11
Locale: en-US (de_DE); UI-Language: en-US
Calc: threaded
Comment 2 raal 2019-01-20 21:35:21 UTC
This seems to have begun at the below commit.
Adding Cc: to Mike Kaganski ; Could you possibly take a look at this one?
Thanks
 31b05317cba85743ffabf8ae94622f7d298b37d2 is the first bad commit
commit 31b05317cba85743ffabf8ae94622f7d298b37d2
Author: Jenkins Build User <tdf@pollux.tdf>
Date:   Thu Jul 20 16:05:35 2017 +0200

    source 4b4cd502806cfc9c9cc9754b8aae18a2c2632cdc

author	Mike Kaganski <mike.kaganski@collabora.com>	2017-07-18 23:02:32 +0300
committer	Mike Kaganski <mike.kaganski@collabora.com>	2017-07-20 11:06:50 +0200
commit 4b4cd502806cfc9c9cc9754b8aae18a2c2632cdc (patch)
tree 8b65daf960cece6fea42867a324fb720f734fdaf
parent 44401915b89582ebc50c644c4db38466a841d457 (diff)
tdf#108849: allow out-of-order sectPr
Comment 3 Mike Kaganski 2019-01-21 08:07:34 UTC
Oh. I should have known that fixing one invalid document to be bug-to-bug compatible with Word, I'll break another invalid document. And this one, indeed, violates OOXML standard in a similar way as tdf#108849. It has several sectPr placed in invalid places inside the document.

How was this document generated? is that LibreOffice (as suggested by generator)? Or possibly that's Google Docs, which is known to use LO for export to ODF, but does it incorrectly, generating invalid files right and left?

I will check if I can hack the hack from tdf#108849 to not break applications that generate hacked OOXMLs, but cannot promise it would be a high priority task.
Comment 4 Mike Kaganski 2019-01-21 08:08:35 UTC
But if it's LibreOffice that generates such files, then it would be another bug, with much higher priority! Please file the bug if it's so!
Comment 5 M. A. Sridhar 2019-01-21 16:15:08 UTC
I was the one that reported this bug. I generated this document using some code that we use for document generation using template expansion. I am not fully aware of the rules around correct placement of sectPr nodes, and I got to this point by trial-and-error experiments and reverse-engineering with Word template expansions. In general, my understanding is that there should be one sectPr node before each page break (w:br type=page) node. I took the stance of "if it works with Word, it should work with Libre Writer". Please point me to any documentation I can get that clearly articulates the rules for correct placement of sectPr nodes, and I will see if I can abide by the rules.

But more generally,  my biggest concern is that any generated files should be openable by Word, and I have seen cases where, if I don't insert the sectPr node the way I have done, Word will not open the document. And having the ability to use LibreOffice Writer to convert to PDF is crucial to us, so for now, we have fallen back to using version 5 until this issue is resolved.

If I can provide any more information or help in any way, please let me know.
Comment 6 Mike Kaganski 2019-01-21 16:38:52 UTC
(In reply to M. A. Sridhar from comment #5)
> I was the one that reported this bug. I generated this document using some
> code that we use for document generation using template expansion. I am not
> fully aware of the rules around correct placement of sectPr nodes, and I got
> to this point by trial-and-error experiments and reverse-engineering with
> Word template expansions. In general, my understanding is that there should
> be one sectPr node before each page break (w:br type=page) node.

Irony. OOXML was the most advertised open specification created by MS, yet the fact that specification exist stays unknown :-) The normative reference is ISO/IEC 29500-1:2016, and it is freely available form ISO site [1].

> I took the
> stance of "if it works with Word, it should work with Libre Writer".

Sigh. This is what makes us not only struggle with all the standard complexity (which is unavoidable and understandable), but also struggle to implement all Word's *bugs/ that people discover using "trial-and-error experiments and reverse-engineering", and then start producing documents exploiting those bugs... so of course, we also must have those bugs to be "compliant". The other ironic fact is that it's enough to re-save such document from Word, to see the proper syntax.

> Please
> point me to any documentation I can get that clearly articulates the rules
> for correct placement of sectPr nodes, and I will see if I can abide by the
> rules.

Citing from tdf#108849 (which was mentioned here twice already, which makes me wonder if it's that difficult to click a link to see what's the origin of the problem!):
> According to ISO/IEC 29500-1:2016(E) 17.6.17 sectPr (Document Final
> Section Properties), the final <w:sectPr> must be the last child element
> of the body element. Also, this is enforced in schema for CT_Body complex
> type (Annex A. (normative) Schemas – W3C XML Schema, A.1 WordprocessingML,
> page 3866), where sectPr is a part of <xsd:sequence>, and thus *must* stay
> at specific place in sequence, namely being the last element, and be at
> most one instance.


> But more generally,  my biggest concern is that any generated files should
> be openable by Word, and I have seen cases where, if I don't insert the
> sectPr node the way I have done, Word will not open the document. And having
> the ability to use LibreOffice Writer to convert to PDF is crucial to us, so
> for now, we have fallen back to using version 5 until this issue is resolved.
> 
> If I can provide any more information or help in any way, please let me know.

Hope that above helps in clarifying this.

[1] https://www.iso.org/standard/71691.html
Comment 7 M. A. Sridhar 2019-01-21 16:56:45 UTC
Sorry for being dense here. I did in fact look at the link you mentioned (108849) but did not get any sense that it applied to this situation, because I honestly don't know the innards of these things. And second question for you: based on this quote:
> According to ISO/IEC 29500-1:2016(E) 17.6.17 sectPr (Document Final
> Section Properties), the final <w:sectPr> must be the last child element
> of the body element. Also, this is enforced in schema for CT_Body complex
> type (Annex A. (normative) Schemas – W3C XML Schema, A.1 WordprocessingML,
> page 3866), where sectPr is a part of <xsd:sequence>, and thus *must* stay
> at specific place in sequence, namely being the last element, and be at
> most one instance.
The document I submitted does in fact have the final sectPr as the last child of the body. So perhaps the violation is that it has multiple sectPr elements? If so, as I said earlier, removing the multiple sectPr elements will cause Word to fail to open it (which could well be one of the Word bugs you speak of, I don't know). Please suggest how to proceed. Thanks so much for all the help!
Comment 8 Mike Kaganski 2019-01-21 18:27:15 UTC
(In reply to M. A. Sridhar from comment #7)

Please download the standard, and compare the two subsequent sections:
> 17.6.17 sectPr (Document Final Section Properties)
> 17.6.18 sectPr (Section Properties)
Essentially, they have defined two different kinds of sectPr: one in the end of document, at the same level with paragraphs; and optional sectPr as a (sub)child of paragraphs! There is the proper place where non-final sectPr may only be properly placed.

Word's bug is that it accepts those invalid top-level non-final sectPrs. If it rejected them as invalid, or at least ignored completely, there would be no incentive in third-party developers doing those kind of mistakes.
Comment 9 Mike Kaganski 2019-01-21 18:45:15 UTC
And please, save your file with Word! You will see the proper structure if you do. Not only creating something that opens "correctly", but also saving the result and inspecting the code written by reference application is important. In your case, you don't need multiple sectPrs at all - all you seem to need is page breaks, which are created using <w:p><w:r><w:br w:type="page"/></w:r></w:p>.
Comment 10 M. A. Sridhar 2019-01-21 19:01:54 UTC
Thank you for all your help, Mike. I will try out the steps you indicated.

By the way, for what it's worth, I have tried saving similar test files in Word to understand what causes it to fail to open. But Word seems to add a lot of unrelated cruft into the XML, so it's really hard to compare the before-and-after versions to determine what exactly causes a given issue.

Thanks again! I will keep you posted.
Comment 11 M. A. Sridhar 2019-01-21 20:48:56 UTC
Mike, I've changed our code so that it inserts the sectPr as a child of a pPr element of the paragraph that contains the page break element. That seems to work fine in our tests so far, both in Word and in LibreOffice 6.1. We will be doing some more testing over the next week or two, and if anything else comes up, I will let you know.

Thank you again for your help!
Comment 12 Timur 2019-09-11 10:48:11 UTC
Is there a reason to keep this bug open? 
(In reply to Mike Kaganski from comment #3)
> I will check if I can hack the hack from tdf#108849 to not break
> applications that generate hacked OOXMLs, but cannot promise it would be a
> high priority task.
Because of this?
Comment 13 Justin L 2021-03-19 11:54:08 UTC
Code-generated, out-of-spec example document == NOTOURBUG.