Bug 164201 - FILESAVE DOCX compat15: document round-tripped as compat12 instead of compat15
Summary: FILESAVE DOCX compat15: document round-tripped as compat12 instead of compat15
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.6.0.0 alpha0+
Hardware: All All
: medium normal
Assignee: Justin L
URL:
Whiteboard: target:25.8.0 target:25.2.0.2
Keywords: bibisected, bisected, regression
Depends on:
Blocks: DOCX-compatibilityMode-15
  Show dependency treegraph
 
Reported: 2024-12-05 22:38 UTC by Justin L
Modified: 2025-01-02 12:23 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
forum-en-28018.docx: table shifts over margin after round-trip (61.56 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2024-12-05 22:38 UTC, Justin L
Details
forum-en-28018.docx: table shifts over margin after round-trip (73.49 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2024-12-06 14:24 UTC, Justin L
Details
forum-en-28018.docx_mso.pdf: how it looks in Word 2019 (742.63 KB, application/pdf)
2024-12-06 14:29 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Justin L 2024-12-05 22:38:39 UTC
Created attachment 197965 [details]
forum-en-28018.docx: table shifts over margin after round-trip

Export shifts the table over the margin - as seen in both MS Word and LO.

Steps to reproduce:
1.) open forum-en-28018.docx. Notice that the table border is lined up with the page margin.
2.) save and reload.

The table border is no longer lined up. Instead, the text in the table is lined up with the marign.

Bisected to LO 7.6 commit e66ddcd4b66923bc835bd7c5f5c784a809a420a2
Author: Michael Stahl on Fri Feb 17 12:10:38 2023 +0100
    tdf#137883 sw: DOCX export: compatibilityMode=12 for ECMA 376 1st ed.

which was somewhat surprising because this document is compat14.
Prior to mstahl's patch, it was saving as compat15 (why not 14?). Now it is saving as compat12. Why is this filter considered to be 2007 only?
Comment 1 Justin L 2024-12-06 14:24:14 UTC
Created attachment 197980 [details]
forum-en-28018.docx: table shifts over margin after round-trip

Apparently I have completely butchered this bug report.
I must have over-written my example document multiple times.

The original is in fact compat15 (so I must have overwritten with my Word 2010).
The one I first uploaded was in fact compat12 (so I must have also round-tripped in LO).
Comment 2 Justin L 2024-12-06 14:29:06 UTC
Created attachment 197983 [details]
forum-en-28018.docx_mso.pdf: how it looks in Word 2019
Comment 3 Justin L 2024-12-06 18:44:50 UTC
This is coming during import, with FilterDetectDocHandler::getFilterNameFromContentType
providing writer_MS_Word_2007 instead of writer_OOXML

FilterDetectDocHandler::parseRelationship has found 
http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties"))
so        maOOXMLVariant = OOXMLVariant::ECMA_Transitional;
instead of OOXMLVariant::ISO_Transitional

and thus TypeDetection::impl_checkResultsAndAddBestFilter returns "MS Word 2007 XML" instead of "Office Open XML Text"
Comment 4 Justin L 2024-12-06 22:40:58 UTC
I don't see any way that allows updating the filter name.

A reasonable place to try to change it could be in ApplySettingsTable().
    comphelper::SequenceAsHashMap aMap(m_xTextDocument->getArgs());
    aMap[u"FilterName"_ustr] >>=  sFilterName;

but any attempt to setArgs throws an exception. Of course, setArgs could always be extended to accept FilterName, but that sounds extremely hacky.

Not sure how it would be possible to proceed here.
Comment 5 Justin L 2024-12-07 13:31:20 UTC
A lot of shenanigans here. 
GetFilter().getVersion() == oox::core::ECMA_376_1ST_EDITION

The version comes from FileFormatVersion, but that is not one of the getArgs values. It is defined in oox/core/filterbase.hxx as
    enum OoxmlVersion
    {
        ECMA_376_1ST_EDITION,
        ISOIEC_29500_2008
    }

The mapping between the FilterName and FileFormatVersion comes from
filter/source/config/fragments/filters/OOXML_Text.xcu and MS_Word_2007_XML.xcu
    <node oor:name="Office Open XML Text" oor:op="replace">
        <!-- ISO/IEC 29500:2008 -->
        <prop oor:name="FileFormatVersion"><value>1</value></prop>
        <prop oor:name="Type"><value>writer_OOXML</value></prop>

So, it is sufficient to just setArgs( "FilterName" "Office Open XML Text"
Comment 6 Justin L 2024-12-08 00:32:34 UTC
Mitigated hack proposed at https://gerrit.libreoffice.org/c/core/+/178048
Comment 7 Commit Notification 2024-12-09 23:08:11 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/d97085cc6cd2bdc3b6723d1960d0ec5fa0a48165

tdf#164201 docx import: compat14+ cannot be ECMA_376_1ST_EDITION

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Commit Notification 2025-01-01 17:11:06 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/c90c1d4435a64531e1d9b41d0a8dc0b91ba236cd

tdf#164201 docx convert-to: prefer Word 2010–365 Document, not 2007

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Commit Notification 2025-01-02 08:44:30 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "libreoffice-25-2":

https://git.libreoffice.org/core/commit/858c452e5e6b7acad6932df7d1cd2f0950d47f33

tdf#164201 docx convert-to: prefer Word 2010–365 Document, not 2007

It will be available in 25.2.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Justin L 2025-01-02 12:23:30 UTC
The patch in comment 7 fixes the reported problem from an interactive user perspective.

Comment 8's patch handles the case for command-line conversion. I was surprised that in my testing suite, I was still getting a Word 2007/compat12 output. It turned out that running a "--convert-to docx" from the command line does not attempt to "keep the same compatibility level". Instead, it just uses the "preferred" filter for DOCX output. In this case LO had Word 2007 as the "most preferred" choice. Comment 8 changes that to writer_OOXML.

I suppose someone could complain that LO should preserve the compat level. However, I don't see much point in running a DOCX file through a --convert-to DOCX UNLESS you want to make SOME kind of change.