Bug 148952 - Filesave DOCX: Alt Text field of image is lost on open in LO and in MSO
Summary: Filesave DOCX: Alt Text field of image is lost on open in LO and in MSO
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.4.0.0 alpha0+
Hardware: All All
: medium normal
Assignee: Michael Stahl (allotropia)
URL:
Whiteboard: target:7.6.0 target:7.5.1 target:24.2.0
Keywords: filter:docx
Depends on:
Blocks: a11y, Accessibility DOCX-Images
  Show dependency treegraph
 
Reported: 2022-05-05 12:28 UTC by Timur
Modified: 2023-10-20 09:00 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample ODT with picture text fields (11.65 KB, application/vnd.oasis.opendocument.text)
2022-05-05 12:28 UTC, Timur
Details
The original file in Writer and the exported one in Word 2016 (108.90 KB, image/png)
2023-02-16 22:43 UTC, Gabor Kelemen (allotropia)
Details
The example file saved to DOCX with current master (5.27 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-02-16 22:43 UTC, Gabor Kelemen (allotropia)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Timur 2022-05-05 12:28:18 UTC
Created attachment 179945 [details]
Sample ODT with picture text fields

Create ODT with picture or use the attached one. 
In picture Properties-Options add text in Name, Alternative (Text only) and Description fields.
Save as DOCX and reopen in LO and MSO.
Alt Text is lost on open in LO (where it should stay as Alt Text) and in MSO (as Title field).

Note: doesn't apply to DOC.
Comment 1 Dieter 2022-05-26 04:52:43 UTC
I confirm it with

Version: 7.3.4.1 (x64) / LibreOffice Community
Build ID: 13668373362b52f6e3ebcaaecb031bd59a3ac66b
CPU threads: 4; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-GB
Calc: CL
Comment 2 Luke 2022-11-30 01:06:35 UTC
Still in 7.5

Looked at the OOXML. The exporter is generating:
<wp:docPr descr="ImDesc" name="ImName" id="1"/>

but is missing attribute title=, it should be: 
<wp:docPr title="ImAltText" name="ImName" id="1" descr="ImDesc"/>

This comes from the ODF XML: 
<svg:title>ImAltText</svg:title>
Comment 3 Michael Stahl (allotropia) 2023-01-31 18:06:12 UTC
we find this code in DocxAttributeOutput::FlyFrameGraphic()

    if( GetExport().GetFilter().getVersion( ) != oox::core::ECMA_DIALECT )
        docPrattrList->add( XML_title, OUStringToOString( pGrfNode ? pGrfNode->GetTitle() : pOLEFrameFormat->GetObjTitle(), RTL_TEXTENCODING_UTF8 ));

it turns out that in ISO 29500:2008, in ECMA-376 2nd edition, 3rd edition, presumably up to 5th edition (yes that was released in 2021), the wp:docPr may have a "title" attribute.

but in ECMA-376 1st edition, the wp:docPr does not have a "title" attribute.

what exactly does our "OoxmlVersion::ECMA_DIALECT" correspond to, if there are 5 editions to choose from?

saving the attached with "Office Open XML Text (Transitional)" results in:
            <wp:docPr id="1" name="ImName" descr="ImDesc" title="ImAltText"/>

saving the attached with "Word 2007-365" results in:
            <wp:docPr id="1" name="ImName" descr="ImDesc"/>

... and i've always thought the "2007-365" supports more features/extensions?

is it supposed to mean "ECMA-376 1st edition" specifically? but why is it named "365" then?
Comment 4 Miklos Vajna 2023-02-01 08:10:40 UTC
> i've always thought the "2007-365" supports more features/extensions?

Yes, that's the idea. "ecma" case is meant to please Word 2007, which understands e.g. "left" and "right" but not "start" and "end".

So by default we do workarounds to please Word 2007, and the other mode is meant to conform to the schema so officetron can point out schema-non-conformance.

My expectation would be that these days we could flip the default to the non-ECMA one, since Word 2007 was EOL in 2017.
Comment 5 Commit Notification 2023-02-01 14:21:00 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/d09996a0d66c51908103afef9c56679b891570d8

tdf#148952 sw: DOCX export: ECMA-376 1st ed does not allow title

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 Michael Stahl (allotropia) 2023-02-01 15:44:23 UTC
so it's intentional that there isn't a "title" attribute.

but the situation can be improved so that when "title" attribute is not allowed
("Word 2007-365" filter) the title can be written into the "descr" attribute so it is not lost completely.

see also bug 137883 for the confusing filter names.
Comment 7 Gabor Kelemen (allotropia) 2023-02-01 21:32:55 UTC
> My expectation would be that these days we could flip the default to the
> non-ECMA one, since Word 2007 was EOL in 2017.

That is happening since 7.0 for new documents, see:

https://cgit.freedesktop.org/libreoffice/core/commit/?id=f25985c55541cbbc9a4fc79e660592d3d0485196

so in my view we should NOT do workarounds to please Word 2007. Those situations are bugs.
Comment 8 Michael Stahl (allotropia) 2023-02-02 08:51:00 UTC
ah yes that's another aspect: it doesn't make sense to export ECMA-376 1st edition for Word 2007 and then set compatibilityMode=15 - if we use the old format for Word 2007 then compatibilityMode=12 and if we use the "current" format then use compatibilityMode=15.
Comment 9 Commit Notification 2023-02-02 08:58:48 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/77f7f28df730d4720188a59e43ea58fc47880975

tdf#148952 sw: DOCX export: ECMA-376 1st ed does not allow title

It will be available in 7.5.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Luke 2023-02-07 04:31:28 UTC
@Michael Stahl
I added myself because I ran into an issue with LO stripping out the alt tag info read by screen readers. This solution still strips it out and doesn't solve that issue. 

Why not properly export it in 1 of the 2 .docx export formats? Office Open XML Transitional or Word 2007-365? Or is it time to support a more modern .docx format?
Comment 11 Michael Stahl (allotropia) 2023-02-07 10:04:30 UTC
(In reply to Luke from comment #10)
> Why not properly export it in 1 of the 2 .docx export formats? Office Open
> XML Transitional or Word 2007-365? Or is it time to support a more modern
> .docx format?

yes, we have already been doing that for many years.
Comment 12 Gabor Kelemen (allotropia) 2023-02-16 22:43:28 UTC
Created attachment 185415 [details]
The original file in Writer and the exported one in Word 2016

Writer by default starting with a fresh ODF write out: 

<w:compatSetting w:name="compatibilityMode" w:uri="http://schemas.microsoft.com/office/word" w:val="15"/> in settings.xml

and yet the merge two fields into one hack for W2007 is applied.

This is not right.

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: c3bd52f81bf733a0b9b0560794a54b2ac1e0f444
CPU threads: 14; OS: Windows 10.0 Build 19045; UI render: default; VCL: win
Locale: en-US (hu_HU); UI: en-US
Calc: threaded
Comment 13 Gabor Kelemen (allotropia) 2023-02-16 22:43:59 UTC
Created attachment 185416 [details]
The example file saved to DOCX with current master
Comment 14 Gabor Kelemen (allotropia) 2023-02-16 22:45:40 UTC
Reopening. The W2007 hack should be used only if the initial file is from W2007, otherwise it's fine to have separate fields saved.
Comment 15 Michael Stahl (allotropia) 2023-02-20 12:25:12 UTC
okay i've done some UI tweaks on bug 137883 that should hopefully make clear what is going on.

(at least on master, not for 7.5 because UI freeze.)

one filter is now named "Word 2007" and doesn't export a title attribute, the other is named "Word 2010-365" and is now the *first* DOCX filter in the list and does export a title attribute.

so hopefully most users will pick the best of the DOCX filters now.
Comment 16 Gabor Kelemen (allotropia) 2023-04-13 00:39:45 UTC
Verified in

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: fc6806c4be8585ce0d35a6b581bf8b3dbf858500
CPU threads: 14; OS: Windows 10.0 Build 19045; UI render: default; VCL: win
Locale: hu-HU (hu_HU); UI: hu-HU
Calc: threaded

Selecting the "Word 2010-365" export filter now saves all Alt fields values, they appear correctly in Word 2010 too.
Comment 17 Commit Notification 2023-10-20 09:00:04 UTC
Venetia committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/d1ba7d0ae6d10c72a1503f88e9b9b32f54cade3b

tdf#148952 sw: Add unit test

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.