Bug 165207 - FILESAVE DOCX: save date field zero'd out (in MS Word) after round-trip
Summary: FILESAVE DOCX: save date field zero'd out (in MS Word) after round-trip
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Justin L
URL:
Whiteboard: target:25.8.0
Keywords: bibisected, bisected
Depends on:
Blocks:
 
Reported: 2025-02-11 20:39 UTC by Justin L
Modified: 2025-02-22 17:54 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
forum-mso-de-138781.docx: example document with date in footer (170.81 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2025-02-11 20:39 UTC, Justin L
Details
165207_modifiedDate.odt: simple date field content (13.88 KB, application/vnd.oasis.opendocument.text)
2025-02-11 22:10 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Justin L 2025-02-11 20:39:14 UTC
Created attachment 199158 [details]
forum-mso-de-138781.docx: example document with date in footer

Visually, the problem is the date in the footer was previously round-tripped unchanged (as seen by MS Word). Now, MS Word reports the save date as all zeros.

More ultimately, this probably has to do with the document properties. With the change below, the last modified date is no longer seen in MS Word. (However, looking at core.xml, I'm not noticing any significant differences.

This started (while saving interactively - convert-to backported to 25.2) with
25.8 commit d97085cc6cd2bdc3b6723d1960d0ec5fa0a48165
Author: Justin Luth on Sat Dec 7 11:42:39 2024 -0500
    tdf#164201 docx import: compat14+ cannot be ECMA_376_1ST_EDITION

Steps to reproduce:
1.) Open and resave forum-mso-de-138781.docx.
2.) Open the round-tripped file in MS Word (probably 2010 or higher)

Notice that the date field in the footer (on page 2) is 00-00-0000. It should be the date that you saved it...

Found by Collabora's mso-test
Comment 1 Justin L 2025-02-11 22:10:21 UTC
Created attachment 199159 [details]
165207_modifiedDate.odt: simple date field content

In order to take out any previous DOCX-isms out of the picture, I tested using ODT->DOCX. That lead me to 24.2's
commit ed0476b0625c4361df5ff040a6661a9634588cea
Author: Michael Stahl on Fri Feb 17 12:25:30 2023 +0100
    tdf#137883 filter: rename DOCX filters to be less confusing
    
    Rename misleading "Word 2007–365" filter which corresponds to the sightly
    incompatible first pre-ISO version of OOXML (ECMA-376 1st edition) and
    is actually very specifically for Word 2007.
    
    Stop confusing users with standardese like "Office Open XML Text
    Document (Transitional)" and instead use the name of the application
    that the format is intended for, "Word 2010-365".
    
    Hopefully users will now pick the latter filter over the former.

And I'm getting the same results all the way back to 3.6 when saving to "Office Open XML Text Document (Transitional)". (Even "Word 2007 DOCX" only seems to round-trip the date, and not update it).

So somehow this has never worked for this format.
Comment 2 Justin L 2025-02-11 22:35:40 UTC
The problem appears to be 
"officedocument/2006/relationships/metadata/core-properties"
instead of 
"package/2006/relationships/metadata/core-properties"
although that is exactly what we use to distinguish Word 2007 on import...
Comment 3 Justin L 2025-02-11 23:24:50 UTC
See oox/source/core/xmlfilterbase.cxx WriteCoreProperties.
So now the question is, when should we be following the spec?
    // The lowercase "officedocument" is intentional and according to the spec
    // (although most other places are written "officeDocument")
    sValue = "http://schemas.openxmlformats.org/officedocument/2006/relationships/metadata/core-properties";

The Internet says:
2.1.30 Part 1 Section 15.2.12.1, Core File Properties Part
a.   The standard specifies a source relations ship for the Core File Properties part as follows: http://schemas.openxmlformats.org/officedocument/2006/relationships/metadata/core-properties.

    Office uses the following source relationship for the Core File Properties part: http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties.

I take that to mean that Office ignores the spec and does something else. We on the other hand have always coded as if MS is following their own spec.
Comment 4 Justin L 2025-02-11 23:30:03 UTC
https://gerrit.libreoffice.org/c/core/+/178048 has a comment
> perhaps the "best" way to do this would be to parse the settings.xml
> in the type detection to get the version number?
which is probably the route we should go, and then just forget that the spec even exists...
Comment 5 Justin L 2025-02-11 23:46:22 UTC
Something strange though. In my personal testing (Word 2010 and Word 2019) I always get zero'd out dates. But in mso-test, Word 2019 makes a PDF with the correct date....  Oh, I just answered my own question. While the SCREEN displays zeros and xxx's, the PDF contains the date (because Word "modifies" the document before creating the PDF and thereby also updates the SCREEN to that moment in time).
Comment 6 Justin L 2025-02-12 14:12:17 UTC
(In reply to Justin L from comment #1)
> (Even "Word 2007 DOCX" only seems to round-trip the date, and not update it).
That is because we don't treat a simple save as a "modification". For us, it is a modified field, not a save date.

Of course, the date is changed if there is actually a modification to the document instead of a simple round-trip or convert-to.
Comment 7 Justin L 2025-02-12 19:17:21 UTC
Let me document this here, because it almost certainly will come back to bite me.

My fix affects uiwriter4.cxx's testTdf72942.
The main file is fdo72942.docx (has settings.xml but no compatSetting entries - thus is treated as compat12 / Word 2007).
testTdf72942 does an Insert - Text from file with fdo72942-insert.docx which is compat15 / Word 2013).

So, previously the logic told it to treat both as Word 2007 formats, and thus in SwView::InsertMedium, StartConvertFrom found a pRead which ends up calling SwDOCXReader::Read (from 5.4/6.0 bug 112025) - which first adds an empty paragraph and then imports the contents.

Now, the logic correctly inserts the file as "Office Open XML Text" which is a different filter and thus no pRead and so it follows a different code path, which does NOT first add an empty paragraph before inserting (like what happened prior to 5.4).
Comment 8 Justin L 2025-02-12 19:58:42 UTC
(In reply to Justin L from comment #7)
> So, previously the logic told it to treat both as Word 2007 formats, and
> thus in SwView::InsertMedium, StartConvertFrom found a pRead which ends up
Nah, it has nothing to do with "the same filter".  MS_WORD_2007_XML.xcu has 
    <prop oor:name="UserData"><value>OXML</value></prop>
while Word 2010 has no value specified.
Comment 9 Commit Notification 2025-02-22 17:43:01 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e4b629c1eecf8cd46007fb064179d765d55fd26b

tdf#165207 tdf#164201 docx: always use errata uri in docProps/core.xml

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.