Bug 132683 - Fileopen DOCX: Alt Text field of image opens as Description, Alternative remains empty (comment 13)
Summary: Fileopen DOCX: Alt Text field of image opens as Description, Alternative rema...
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
6.0.7.3 release
Hardware: All All
: low minor
Assignee: Not Assigned
URL: https://help.libreoffice.org/7.0/en-U...
Whiteboard:
Keywords: accessibility, filter:docx
Depends on:
Blocks: a11y, Accessibility
  Show dependency treegraph
 
Reported: 2020-05-04 14:55 UTC by Rhys Young
Modified: 2022-05-05 14:13 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
DOCX with alt text (62.19 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-05-04 14:55 UTC, Rhys Young
Details
pdf no alt text (58.85 KB, application/pdf)
2020-05-04 14:56 UTC, Rhys Young
Details
DOCX exported to PDF from MSO (68.12 KB, application/pdf)
2020-05-20 10:30 UTC, Timur
Details
DOCX exported to PDF from LO 6.0 beta 2 (52.19 KB, application/pdf)
2020-05-21 06:54 UTC, Timur
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rhys Young 2020-05-04 14:55:12 UTC
Description:
After converting docx to pdf the image no longer has it's alt text attached.

Steps to Reproduce:
1. Convert file to PDF using 'soffice --headless --nolockcheck --nodefault --nofirststartwizard --nologo --norestore --convert-to pdf --outdir /tmp /tmp/test.docx'
2. Open PDF using a viewer
3. Observe the pdf has no alt text attached to the image

Actual Results:
The image should have alt text.

Expected Results:
The image does not have alt text.


Reproducible: Always


User Profile Reset: No



Additional Info:
None
Comment 1 Rhys Young 2020-05-04 14:55:36 UTC
Created attachment 160341 [details]
DOCX with alt text
Comment 2 Rhys Young 2020-05-04 14:56:50 UTC
Created attachment 160342 [details]
pdf no alt text
Comment 3 Rhys Young 2020-05-04 14:57:13 UTC
Libre seems to take alt text from word as description text instead of alt text.
Comment 4 Timur 2020-05-20 09:10:44 UTC
There are 2 issues here: 

1. Fileopen DOCX opens Alt Text from MSO as Description in LO, not as Alternative.
Help doesn't explain Description and for "Alternative text" says "Enter the text to display in a web browser when the selected item is unavailable."

2. Export as PDF doesn't export alt text from neither field.  
Must be tested with PDF reader that supports alt text. 
While not clear how Description should be exported, Alternative doesn't work.
Regression, because (after 5.0 didn't) LO 6.0 used to export properly "Alternative text" (as explained in 1. not from DOCX but typed) and LO 6.1 doesn't again.

Let's start from 2. so I remove docx from the title.  
There are other bugs with alt text interop issues for other objects and formats.
During testing with master 7.0+, image wasn't exported once if Alternative was set in Image-Options, I can't say why.  

Note: headless is not needed and is wrong to report unless it happens only in headless, which is not the case here.
Comment 5 Timur 2020-05-20 10:30:49 UTC
Created attachment 161034 [details]
DOCX exported to PDF from MSO

In PDF from MSO, Alt Text appears in Adobe Reader in Windows.
Comment 6 Timur 2020-05-20 10:49:18 UTC
LO 6.0 beta used to export properly "Alternative text" and LO 6.0.7 doesn't again, per test in Windows.

I tried bibisect 6.0 in Linux but I couldn't see alttext, just missing image bug that was fixed with: 
 b1008b030246939187e5c30ba750d6abb397161d is the first fixed commit
commit b1008b030246939187e5c30ba750d6abb397161d
Author: Jenkins Build User <tdf@pollux.tdf>
Date:   Thu Jun 22 02:10:02 2017 +0200

    source 77da7b934d782153be9271605691ceee6c66233a
    
    source 77da7b934d782153be9271605691ceee6c66233a
    source 48da675a67a2bfd2eadfd6d4c6dba0dee74b5326
    source 9b68ce7b0f2326ec540717ec5c8207825403774e
    source d2e4aeb929b346acd0d1a2eaeee7237b89b99474
    source 08792a4b332d907c72d1fc7301133f5b306ec8dd
    source d7824bf16898d8cb776420e0c2bff82e6df61b86
    source f05d0d05829dd51cb9d8071ac97cc219779ee40a
    source 266bcae306a1dd6e0d9df80ba30ade7311385c28
    source 08316e5edfc36ed75a4e8dc5b6aa7eea3af4eea9
    source 136ce64b18283acf9db5d130f8ac9108591dd4ee
    source b29bae1064c9f980cc50a667e8b96c5e370326d7
    
    Previous source c0ce1ec3736be861a2ed58827fadb25269ab0117

I hope bibisect may be done in Windows.
Comment 7 Buovjaga 2020-05-20 15:07:47 UTC
(In reply to Timur from comment #6)
> LO 6.0 beta used to export properly "Alternative text" and LO 6.0.7 doesn't
> again, per test in Windows.

Can you give me the exact commit when it worked in 6.0?

I tried Win 6.0 repo and was unable to find a commit where it worked. Tried oldest, master, then git checkout HEAD~500 from master, twice.
Comment 8 Timur 2020-05-21 06:54:57 UTC
Created attachment 161059 [details]
DOCX exported to PDF from LO 6.0 beta 2

Here is where it works, as shown in the attached.

Version: 6.0.0.0.beta2
Build ID: 13edaaa12f25de343fce136064e27da66c1c4fa4
CPU threads: 8; OS: Windows 6.1; UI render: GL; 
Locale: bs-BA (bs_BA); Calc: CL

Please note that you must type in AltText or have ODT saved with it (for headless), original DOCX will not work.
Comment 9 Timur 2020-05-21 07:21:33 UTC Comment hidden (obsolete)
Comment 10 Buovjaga 2020-05-21 13:33:04 UTC
(In reply to Timur from comment #9)
> I don't know if there's a better way to translate
> https://git.libreoffice.org/core/+log/
> 13edaaa12f25de343fce136064e27da66c1c4fa4 to bibisect commit, but I took 
> source and found ae1bb1166afa8ea6abdb656cbd9a7e6075db9313. 
> Linux doesn't export AltText headless or GUI, or this is something even more
> strange.

Ok, taking 3rd commit from the top and doing

git log --all --grep='2e368c5946ba1e608ff263e5892b10d02c90275b' 

in win 6.0 repo gave me the bibisect commit hash c1ac2cc1993f3955491bb8eb99e2b9146aaec4be

I still don't see the alt text in the pdf.

I created an ODT from the DOCX, right-clicked image - Properties - Options, added stuff to the Alternative (Text Only field).
After exporting PDF, I opened it in Acrobat Reader and hovered my mouse over the image. Nothing was shown. With your attachment 161034 [details] I can see the text in Acrobat Reader.

Can someone please tell me the valid steps to test this?? I don't care which PDF reader, Win/Linux, I have a shared folder between Linux and Win VM, just tell me the steps to confirm the alt text was saved.
Comment 11 Timur 2020-05-21 14:07:33 UTC Comment hidden (obsolete)
Comment 12 Buovjaga 2020-05-21 14:31:23 UTC
(In reply to Timur from comment #11)
> Well, that's it. Except if there's another bug in headless, in you used that
> please try GUI to be sure. 
> Otherwise, we may regrettably mark NotBibisectable.

GUI always. Let's allow someone else to try as well.
Comment 13 Timur 2020-07-27 19:37:31 UTC
Seems I misdirected this, so let's start again.
There are 2 issues here (no need to use headless, also regular GUI): 

1. Fileopen DOCX opens Alt Text from MSO as Description in LO, not as Alternative.
Help doesn't explain Description and for "Alternative text" says "Enter the text to display in a web browser when the selected item is unavailable."
So I don't know why it happens, but let this bug be about this, unless explained differently that this is not a bug. 

2. Export as PDF doesn't export alt text from either field.  
Must be tested with PDF reader that supports alt text (like Adobe Reader). 
This works from LO 7.0 per bug 45636.
Comment 14 Christophe Strobbe 2022-05-04 21:32:54 UTC
I retested this with LibreOffice 7.1.4.2 on Windows 10:
1. I created a DOCX document using Word 2016, inserted the string 'Title' in the title field and 'Description' in the description field.
2. I opened the DOCX file in LibreOffice 7.1.4.2. I found the string 'Title' in the field "Alternative (Text only)" and the string 'Description' in the Description field.
3. I exported the document to PDF (from LibreOffice) and opened the resulting PDF file in Adobe Acrobat Pro. In Acrobat, I found the "Alternate text" as follows "Title - Description", i.e. LibreOffice concatenates the title and the description with just " - " between them.

In Rhys Young's original DOCX file, the Description file was filled in and the Title field was empty. In LibreOffice 7.1.4.2, the description ends up in the Description field. After exporting to PDF, that description ends up in the Alternate Text field.
The PDF attached as "pdf no alt text" is not even Tagged PDF, which means it was not exported with the appropriate settings to generate a text alternative in the first place.

With the appropriate export options, the image does actually get a text alternative in the PDF file, as described above (cf. concatenation / combination of Title and Description).

Does that mean the original issue has been fixed?
Comment 15 Timur 2022-05-05 12:22:18 UTC
Thanks Christophe for an extensive test. 
Seems like we can close this, as no info is lost on fileopen. 

There still exists a bug for filesave to DOCX, Alt Text is lost on open in LO (where it should stay as Alt Text) and in MSO (where it's Title).
I'll open and See Also.