Created attachment 187237 [details] This is how the .docx file looks (correctly) in MS Word The following .docx Microsoft Word file is imported/displayed absolutely incorrectly in LibreOffice Writer. The layout is completely wrong and most of the text is missing: https://vergabe.niedersachsen.de/Satellite/public/company/project/CXTMYYDYRHF/de/documents/misc/VVB+236+-+Verplichtungserklaerung+anderer+Unternehmen+12-2017.docx System: Version: 7.5.2.2 (X86_64) / LibreOffice Community Build ID: 50(Build:2) CPU threads: 16; OS: Linux 6.2; UI render: default; VCL: gtk3 Locale: fr-FR (fr_FR.UTF-8); UI: fr-FR Ubuntu package version: 4:7.5.2-0ubuntu1 Calc: threaded
Created attachment 187246 [details] File sample resaved with word. Opening the file with Microsoft® Word para Microsoft 365 MSO (version 2304 compilation 16.0.16327.20200) 64 bits and saving it, then LibreOffice opens fine the file. So not sure if it is our bug.
Created attachment 187248 [details] Example file
Confirm Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: 066b23115c2a360507e306a88da572554daefab7 CPU threads: 8; OS: Mac OS X 12.6.3; UI render: Skia/Raster; VCL: osx Locale: nl-NL (nl_NL.UTF-8); UI: en-US Calc: threaded
Also in 4.4.7.2 and in Versie 4.0.0.3 (Bouw-id: 7545bee9c2a0782548772a21bc84a9dcc583b89)
@Justin L Some analysis would be nice to have.. if you're interested of course
FYI, I encountered the same problems with the following three .docx documents, too. Unfortunately, these are all official documents in public tender/procurement procedures of the Federal Republic of Germany VHB-Bund (required templates/forms of the Vergabe- und Vertragshandbuch des Bundes (VHB)), thus the user needs to be able to use these documents in public tender procedures. https://www.evergabe.nrw.de/VMPSatellite/public/company/project/73091/de/documents/filledByCompany/VVB+234+-+Erklaerung+Bieter-_Arbeitsgemeinschaft+12-2017.docx https://vergabe.niedersachsen.de/Satellite/public/company/project/CXS0YMTYY4J/de/documents/filledByCompany/VVB+124_LD+-+Eigenerklaerung+zur+Eignung+Liefer-_Dienstleistungen+07-2019.docx https://www.evergabe.nrw.de/VMPSatellite/public/company/project/CXS0Y6XYYPB/de/documents/filledByCompany/VVB+235+-+Verzeichnis+der+Leistungen_Kapazitaeten+anderer+Unternehmen+12-2017.docx
tested with bibisect-releases and see it is inherited from OOo. The problem is related to bookmarks in the original document that are removed when MS Word round-trips the file. <w:fldChar w:fldCharType="begin"> <w:ffData> <w:name w:val="Text2"/> <w:enabled/> <w:calcOnExit w:val="0"/> <w:textInput> <w:default w:val="${fn:format-date(fn:current-date(),'[D01].[M01].[Y0001]')}"/> </w:textInput> </w:ffData> </w:fldChar> </w:r> <w:bookmarkStart w:id="1" w:name="Text2"/> <w:r> <w:t>29.09.2022</w:t> </w:r> <w:bookmarkEnd w:id="1"/>
The reason the fldChar is invalid is because it is missing an end. <w:r> <w:fldChar w:fldCharType="end"/> </w:r>
The first thing we do in DomainMapper_Impl::finishParagraph is to just do an early return if the field command is not finished. We can't just assume that the end of the paragraph can just close any started field commands because of embedded fields, paragraphs in field shapes/tables etc. This probably needs to be chalked up to corrupt documents, and likely WONTFIX.
(In reply to Justin L from comment #9) > This probably needs to be chalked up to corrupt documents, and likely > WONTFIX. This might be a corrupt document in the technical sense. However WONTFIX is somewhat problematic from end-user perspective: * The documents are official published government forms, which need to be used. * There a probably a lot more of those (because it are government documents) * Even corrupt, those somehow work with MSO * The documents are apparently created by Microsoft Office Word 16.0000 (if app.xml delivering proper information). So won't fix, entails, use MSO (or some other alternative).