Bug 155190 - FILESAVE Acrobat Accessibility checker reports "Associated with content" error in alternative text check
Summary: FILESAVE Acrobat Accessibility checker reports "Associated with content" erro...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
7.4.4.2 release
Hardware: All All
: medium normal
Assignee: Michael Stahl (allotropia)
URL:
Whiteboard: target:7.6.0 target:7.5.5
Keywords: accessibility, bibisected, bisected, regression
Depends on:
Blocks: PDF-Accessibility
  Show dependency treegraph
 
Reported: 2023-05-08 13:29 UTC by devseppala
Modified: 2023-05-31 23:59 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Example document with picture and associated alternative text. (52.38 KB, application/vnd.oasis.opendocument.text)
2023-05-08 13:29 UTC, devseppala
Details
PDF export of example doc using LO 7.4.3 (27.98 KB, application/pdf)
2023-05-08 13:30 UTC, devseppala
Details
Acrobat accessibility check result of LO 7.4.3 PDF export (Passed) (39.85 KB, image/gif)
2023-05-08 13:31 UTC, devseppala
Details
PDF export of example doc using LO 7.4.4 (28.06 KB, application/pdf)
2023-05-08 13:32 UTC, devseppala
Details
Acrobat accessibility check result of LO 7.4.4 PDF export (Failed) (40.60 KB, image/gif)
2023-05-08 13:32 UTC, devseppala
Details

Note You need to log in before you can comment on or make changes to this bug.
Description devseppala 2023-05-08 13:29:01 UTC
Created attachment 187140 [details]
Example document with picture and associated alternative text.

When exporting documents that have images with alternative text to PDF, Adobe accessibility checker reports "Associated with content" errors in alternative text check section. These errors started to appear after updating to LO 7.4.4, so one the accessibility changes in 7.4.4 RC1 must have triggered it.

https://wiki.documentfoundation.org/Releases/7.4.4/RC1

This does not happen if the images are tagged with the new "Decorative" option, that was introduced in the LO 7.5 series. 

The following page suggest that "Associated with content" errors indicate that the alternative text is put to wrong element:

https://community.adobe.com/t5/acrobat-discussions/quot-associated-with-content-quot-failed-in-adobe-accessibility-checker/td-p/10427592

I will provide example file and Acrobat screenshots of the error message. Although, these propably are not very informative.
Comment 1 devseppala 2023-05-08 13:30:31 UTC
Created attachment 187141 [details]
PDF export of example doc using LO 7.4.3
Comment 2 devseppala 2023-05-08 13:31:36 UTC
Created attachment 187142 [details]
Acrobat accessibility check result of LO 7.4.3 PDF export (Passed)
Comment 3 devseppala 2023-05-08 13:32:12 UTC
Created attachment 187143 [details]
PDF export of example doc using LO 7.4.4
Comment 4 devseppala 2023-05-08 13:32:53 UTC
Created attachment 187144 [details]
Acrobat accessibility check result of LO 7.4.4 PDF export (Failed)
Comment 5 devseppala 2023-05-12 09:32:09 UTC
FYI, as far as I can tell, the only alternative text changes in LO 7.4.4 were.

tdf#57423 PDF: "Description" and "Title/Text Alternative" is only PDF-exported for Images, but not for Shapes, Formula, Frames, and OLE Objects [Michael Stahl]

tdf#141386 Alternative text from screenshots is only recognised by the PDF Accessibility Checker (PAC 3) after closing and reopening when exporting to a PDF file. [Michael Stahl]
Comment 6 devseppala 2023-05-15 09:36:13 UTC
I did some more digging and realized that Acrobat tag structure viewer shows a small difference in the tag structure of the exported PDF files.

LO 7.4.3 PDF export

<Document>
   <P>
   <P>
      <Div>
         <Caption>
            Text: Figure 
            Text: 1
            Text: : Picture of apples
            <Figure>
               Image (10): w:226 h:226  <--- Missing after upgrade LO 7.4.4 !!!

LO 7.4.4 PDF export shows the same structure, except the last line is missing!!
Comment 7 devseppala 2023-05-15 09:56:09 UTC
I also tried the example files with a tool called ngPDF (https://ngpdf.com/loadFile), which can transform tagged pdf documents to html. This tool correctly  transformed the LO 7.4.3 exported pdf file to html. However, when I tried it with the LO 7.4.4 exported PDF file, the resulting html file was missing the picture. I think this implies that there is indeed some problem in how the picture is attached to the tag structure and it is not just a problem with Acrobat.
Comment 8 Stéphane Guillou (stragu) 2023-05-24 15:24:00 UTC
Thanks devseppala.

Reproduced with a recent master build:

Version: 7.6.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: f4c24da1e7f11664e0d2f688d2531f068e4a3bc0
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Bibisected with linux-64-7.4 repo to first bad commit cbb4a517265c62727578128728b61a12b9c8da97 which points to core commit 7b506aaafa7a982d19ec1cba2909f3ccfe29b130 which is a cherrypick of:

commit 81ef84648515965bf67afaced946227d0f63a71e
author	Michael Stahl <michael.stahl@allotropia.de>	Wed Nov 30 16:40:27 2022 +0100
committer	Michael Stahl <michael.stahl@allotropia.de>	Thu Dec 01 17:04:58 2022 +0100
(related: tdf#135192) svx: PDF/UA export: tag background as Artifact
 ISO 14289-1:2014, Clause: 7.1, Test number: 3
 Content shall be marked as Artifact or tagged as real content
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/143502

Note that I have tested the PDFs with the Firefox accessibility inspector. Since that commit, I can't see the element with role "figure".

Michael, can you please have a look?
Comment 9 Michael Stahl (allotropia) 2023-05-26 14:34:57 UTC
i think this is the problem even without Acrobat: 
inside Figure now a new NonStruct is inserted which is wrong.

% beginStructureElement 16: Figure aliased as "Figure"
% beginStructureElement 17: NonStruct
/Artifact BMC
% drawJPGBitmap
q 212.9 595.889 169.5 169.5 re
W* n
q 169.5 0 0 169.5 212.9 595.939 cm
/Im9 Do Q
EMC
% endStructureElement 17: NonStruct
% endStructureElement 16: Figure aliased as "Figure"
Comment 10 Commit Notification 2023-05-26 17:19:48 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/370533da3f07169791c0a17013ca55c57df2f3c9

tdf#155190 svx,sw: PDF export: don't tag SwNoTextFrame as Artifact

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Michael Stahl (allotropia) 2023-05-26 17:20:48 UTC
please check with Acrobat if this is really fixed...
Comment 12 devseppala 2023-05-29 09:16:31 UTC
(In reply to Michael Stahl (allotropia) from comment #11)
> please check with Acrobat if this is really fixed...

The bug is fixed, I just checked it with Acrobat Pro. No more "Associated with content" errors. Thank you so much for looking into this issue and solving it so fast.

Is there any chance to get this fix in 7.5 series?
Comment 13 Commit Notification 2023-05-31 08:16:45 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/9654214d0d19a7b4b2b33e62bd5083b59ccadd26

tdf#155190 svx,sw: PDF export: don't tag SwNoTextFrame as Artifact

It will be available in 7.5.5.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 Stéphane Guillou (stragu) 2023-05-31 23:59:19 UTC
marking as verified for comment 12. Thank you both!