Created attachment 148445 [details] The .docx file exhibiting the footer anomaly The attached file is expected to have three pages and one header and footer per page. But it shows multiple footers and no page breaks when displayed under LibreOffice Writer 6.1. However, it shows three pages and one footer per page as expected under LibreOffice Writer version 5.1. Microsoft Word and Google Docs also correctly show the three pages.
I confirm this 3 pages in Version: 5.4.7.2 (x64) Build-ID: c838ef25c16710f8838b1faec480ebba495259d0 CPU-Threads: 4; BS: Windows 6.19; UI-Render: GL; Gebietsschema: de-DE (de_DE); Calc: CL 1 page in Version: 6.1.4.2 (x64) Build-ID: 9d0f32d1f0b509096fd65e0d4bec26ddd1938fd3 CPU-Threads: 4; BS: Windows 10.0; UI-Render: Standard; Gebietsschema: de-DE (de_DE); Calc: group threaded and Version: 6.3.0.0.alpha0+ (x64) Build ID: 411f3a050ac2be598019d512f8ccfe041080c28f CPU threads: 4; OS: Windows 10.0; UI render: default; VCL: win; TinderBox: Win-x86_64@42, Branch:master, Time: 2019-01-14_03:17:11 Locale: en-US (de_DE); UI-Language: en-US Calc: threaded
This seems to have begun at the below commit. Adding Cc: to Mike Kaganski ; Could you possibly take a look at this one? Thanks 31b05317cba85743ffabf8ae94622f7d298b37d2 is the first bad commit commit 31b05317cba85743ffabf8ae94622f7d298b37d2 Author: Jenkins Build User <tdf@pollux.tdf> Date: Thu Jul 20 16:05:35 2017 +0200 source 4b4cd502806cfc9c9cc9754b8aae18a2c2632cdc author Mike Kaganski <mike.kaganski@collabora.com> 2017-07-18 23:02:32 +0300 committer Mike Kaganski <mike.kaganski@collabora.com> 2017-07-20 11:06:50 +0200 commit 4b4cd502806cfc9c9cc9754b8aae18a2c2632cdc (patch) tree 8b65daf960cece6fea42867a324fb720f734fdaf parent 44401915b89582ebc50c644c4db38466a841d457 (diff) tdf#108849: allow out-of-order sectPr
Oh. I should have known that fixing one invalid document to be bug-to-bug compatible with Word, I'll break another invalid document. And this one, indeed, violates OOXML standard in a similar way as tdf#108849. It has several sectPr placed in invalid places inside the document. How was this document generated? is that LibreOffice (as suggested by generator)? Or possibly that's Google Docs, which is known to use LO for export to ODF, but does it incorrectly, generating invalid files right and left? I will check if I can hack the hack from tdf#108849 to not break applications that generate hacked OOXMLs, but cannot promise it would be a high priority task.
But if it's LibreOffice that generates such files, then it would be another bug, with much higher priority! Please file the bug if it's so!
I was the one that reported this bug. I generated this document using some code that we use for document generation using template expansion. I am not fully aware of the rules around correct placement of sectPr nodes, and I got to this point by trial-and-error experiments and reverse-engineering with Word template expansions. In general, my understanding is that there should be one sectPr node before each page break (w:br type=page) node. I took the stance of "if it works with Word, it should work with Libre Writer". Please point me to any documentation I can get that clearly articulates the rules for correct placement of sectPr nodes, and I will see if I can abide by the rules. But more generally, my biggest concern is that any generated files should be openable by Word, and I have seen cases where, if I don't insert the sectPr node the way I have done, Word will not open the document. And having the ability to use LibreOffice Writer to convert to PDF is crucial to us, so for now, we have fallen back to using version 5 until this issue is resolved. If I can provide any more information or help in any way, please let me know.
(In reply to M. A. Sridhar from comment #5) > I was the one that reported this bug. I generated this document using some > code that we use for document generation using template expansion. I am not > fully aware of the rules around correct placement of sectPr nodes, and I got > to this point by trial-and-error experiments and reverse-engineering with > Word template expansions. In general, my understanding is that there should > be one sectPr node before each page break (w:br type=page) node. Irony. OOXML was the most advertised open specification created by MS, yet the fact that specification exist stays unknown :-) The normative reference is ISO/IEC 29500-1:2016, and it is freely available form ISO site [1]. > I took the > stance of "if it works with Word, it should work with Libre Writer". Sigh. This is what makes us not only struggle with all the standard complexity (which is unavoidable and understandable), but also struggle to implement all Word's *bugs/ that people discover using "trial-and-error experiments and reverse-engineering", and then start producing documents exploiting those bugs... so of course, we also must have those bugs to be "compliant". The other ironic fact is that it's enough to re-save such document from Word, to see the proper syntax. > Please > point me to any documentation I can get that clearly articulates the rules > for correct placement of sectPr nodes, and I will see if I can abide by the > rules. Citing from tdf#108849 (which was mentioned here twice already, which makes me wonder if it's that difficult to click a link to see what's the origin of the problem!): > According to ISO/IEC 29500-1:2016(E) 17.6.17 sectPr (Document Final > Section Properties), the final <w:sectPr> must be the last child element > of the body element. Also, this is enforced in schema for CT_Body complex > type (Annex A. (normative) Schemas – W3C XML Schema, A.1 WordprocessingML, > page 3866), where sectPr is a part of <xsd:sequence>, and thus *must* stay > at specific place in sequence, namely being the last element, and be at > most one instance. > But more generally, my biggest concern is that any generated files should > be openable by Word, and I have seen cases where, if I don't insert the > sectPr node the way I have done, Word will not open the document. And having > the ability to use LibreOffice Writer to convert to PDF is crucial to us, so > for now, we have fallen back to using version 5 until this issue is resolved. > > If I can provide any more information or help in any way, please let me know. Hope that above helps in clarifying this. [1] https://www.iso.org/standard/71691.html
Sorry for being dense here. I did in fact look at the link you mentioned (108849) but did not get any sense that it applied to this situation, because I honestly don't know the innards of these things. And second question for you: based on this quote: > According to ISO/IEC 29500-1:2016(E) 17.6.17 sectPr (Document Final > Section Properties), the final <w:sectPr> must be the last child element > of the body element. Also, this is enforced in schema for CT_Body complex > type (Annex A. (normative) Schemas – W3C XML Schema, A.1 WordprocessingML, > page 3866), where sectPr is a part of <xsd:sequence>, and thus *must* stay > at specific place in sequence, namely being the last element, and be at > most one instance. The document I submitted does in fact have the final sectPr as the last child of the body. So perhaps the violation is that it has multiple sectPr elements? If so, as I said earlier, removing the multiple sectPr elements will cause Word to fail to open it (which could well be one of the Word bugs you speak of, I don't know). Please suggest how to proceed. Thanks so much for all the help!
(In reply to M. A. Sridhar from comment #7) Please download the standard, and compare the two subsequent sections: > 17.6.17 sectPr (Document Final Section Properties) > 17.6.18 sectPr (Section Properties) Essentially, they have defined two different kinds of sectPr: one in the end of document, at the same level with paragraphs; and optional sectPr as a (sub)child of paragraphs! There is the proper place where non-final sectPr may only be properly placed. Word's bug is that it accepts those invalid top-level non-final sectPrs. If it rejected them as invalid, or at least ignored completely, there would be no incentive in third-party developers doing those kind of mistakes.
And please, save your file with Word! You will see the proper structure if you do. Not only creating something that opens "correctly", but also saving the result and inspecting the code written by reference application is important. In your case, you don't need multiple sectPrs at all - all you seem to need is page breaks, which are created using <w:p><w:r><w:br w:type="page"/></w:r></w:p>.
Thank you for all your help, Mike. I will try out the steps you indicated. By the way, for what it's worth, I have tried saving similar test files in Word to understand what causes it to fail to open. But Word seems to add a lot of unrelated cruft into the XML, so it's really hard to compare the before-and-after versions to determine what exactly causes a given issue. Thanks again! I will keep you posted.
Mike, I've changed our code so that it inserts the sectPr as a child of a pPr element of the paragraph that contains the page break element. That seems to work fine in our tests so far, both in Word and in LibreOffice 6.1. We will be doing some more testing over the next week or two, and if anything else comes up, I will let you know. Thank you again for your help!
Is there a reason to keep this bug open? (In reply to Mike Kaganski from comment #3) > I will check if I can hack the hack from tdf#108849 to not break > applications that generate hacked OOXMLs, but cannot promise it would be a > high priority task. Because of this?
Code-generated, out-of-spec example document == NOTOURBUG.