Bug 142356 - [accessibility] filter save as HTML places caption inside image, export to XHTML drops the caption neither is good AT support
Summary: [accessibility] filter save as HTML places caption inside image, export to XH...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.2.5.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: accessibility
Depends on:
Blocks: a11y, Accessibility (X)HTML-Export
  Show dependency treegraph
 
Reported: 2021-05-18 14:11 UTC by Stéphane Guillou (stragu)
Modified: 2022-05-16 17:27 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
ODT document with image and caption (14.30 KB, application/vnd.oasis.opendocument.text)
2021-05-19 05:09 UTC, Stéphane Guillou (stragu)
Details
example document saved as HTML (966 bytes, text/html)
2021-05-19 05:11 UTC, Stéphane Guillou (stragu)
Details
image with caption saved alongside HTML file (10.55 KB, image/gif)
2021-05-19 05:12 UTC, Stéphane Guillou (stragu)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stéphane Guillou (stragu) 2021-05-18 14:11:12 UTC
Description:
In this document, a caption of a picture is exported together with the picture, and is therefore not readable by a screen reader for sight-impaired users.
This is a very concerning accessibility shortcoming.

Steps to Reproduce:
1. Open attached ODT (same document as in bug 109334
2. File > Save as > HTML

Actual Results:
The resulting HTML file contains a picture with the caption integrated into it. No screen reader can pick this up, an OCR tool should be used instead.

Expected Results:
The caption is exported as a proper caption, using for example the <figcaption> tag, as described here: https://www.w3schools.com/TAGS/tag_figcaption.asp


Reproducible: Always


User Profile Reset: No



Additional Info:
Interestingly, this does not happen with a freshly created document. Also, an XHTML export has the caption as text.

Version: 7.2.0.0.alpha0+ / LibreOffice Community
Build ID: 6b09276d157abada74e1a4989700139167207778
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-05-14_04:32:30
Calc: threaded
Comment 1 Harshita Nag 2021-05-19 03:03:18 UTC
Can't reproduce this. 
Version: 7.0.4.2
Build ID: dcf040e67528d9187c66b2379df5ea4407429775
CPU threads: 8; OS: Linux 5.3; UI render: default; VCL: gtk3
Locale: en-IN (en_IN); UI: en-US
Calc: threaded
Comment 2 Stéphane Guillou (stragu) 2021-05-19 05:09:57 UTC
Created attachment 172158 [details]
ODT document with image and caption

Forgot to attach the problematic document.

Open, save as HTML, see gif file created along with HTML document.

Confirmed with 7.0.4.2, 7.1.3 and 7.2 alpha0+
Comment 3 Stéphane Guillou (stragu) 2021-05-19 05:11:45 UTC
Created attachment 172159 [details]
example document saved as HTML
Comment 4 Stéphane Guillou (stragu) 2021-05-19 05:12:21 UTC
Created attachment 172160 [details]
image with caption saved alongside HTML file
Comment 5 V Stuart Foote 2021-05-25 12:49:27 UTC
Does the XSL based XHTML filter do better with an Export to XHTML? Or is that a problem as well?
Comment 6 Stéphane Guillou (stragu) 2021-05-25 13:17:25 UTC
Hi Stuart

By "XSL-based XHTML filter", do you mean the one used when exporting as XHTML with "File > Export as... > XHTML"? If so, I said in the description that XHTML export saves the caption as text.

Or do you mean a different filter?

Also, do you confirm the behaviour when saving as HTML?
Comment 7 V Stuart Foote 2021-05-25 14:53:53 UTC
(In reply to stragu from comment #6)
> Hi Stuart
> 
> By "XSL-based XHTML filter", do you mean the one used when exporting as
> XHTML with "File > Export as... > XHTML"? If so, I said in the description
> that XHTML export saves the caption as text.
> 

Yes sorry, I missed your 
"Additional Info:
Interestingly, this does not happen with a freshly created document. Also, an XHTML export has the caption as text." as I skimmed the ticket.

> 
> Also, do you confirm the behaviour when saving as HTML?

Yes the ancient HTML filter simply converts the image frame and its caption to a GIF.  So the caption "Illustration 1: my caption text" is embedded into the bitmap and not available to AT.

For XSL based 'Export' to XHTML, just the image is embedded as PNG in base64. The frame caption is completely dropped.  And the alternative text is picked up as normal text (not linked to the image). So an AT fail there as well.
Comment 8 Stéphane Guillou (stragu) 2021-07-08 06:11:03 UTC
Reproduced in 6.2.5 as well.

Version: 6.2.5.2
Build ID: 1ec314fa52f458adc18c4f025c545a4e8b22c159
CPU threads: 4; OS: Linux 5.4; UI render: default; VCL: gtk3; 
Locale: en-AU (en_AU.UTF-8); UI-Language: en-US
Calc: threaded
Comment 9 Christophe Strobbe 2022-05-16 17:27:04 UTC
I confirm that this bug applies to LibreOffice 7.1.4.2:

Version: 7.1.4.2 / LibreOffice Community
Build ID: 10(Build:2)
CPU threads: 8; OS: Linux 5.5; UI render: default; VCL: kf5
Locale: en-GB (en_GB.utf8); UI: en-GB
Calc: threaded

The difficult part is deciding how to deal with this, i.e. what HTML code should be exported.

For ODF images that have a caption, it makes sense to generate the following code with the "Save as HTML" function (some code exported by LibreOffice omitted for the sake of simplicity:

<figure>
  <img src="..." alt="..." />
  <figcaption>Illustration 1: my Caption Text</figcaption>
</figure>

If the caption is above the image in the ODF file, the figcaption element should end up above the img element in the HTML code.

It is less obvious what to do with the XHTML export, beyond simply adding a span or a p element below the img element. figure and figcaption were first introduced in HTML 5, whereas LibreOffice's "Export to XHTML" function exports to XHTML 1.1 + Math ML 2.0, which is an outdated specification. XHTML 1.1 was superseded in 2018: https://www.w3.org/TR/xhtml11/ .

The XHTML code might end up looking as follows:

<p class="Illustration"><img alt="..." src=" ..." /></p>
<p class="caption">Illustration ... </p>

Note:
1. The content of the alt attribute (the contents of ODF's <svg:title> element) should not be repeated after the img element (which is a different bug).
2. I have replaced <div class="Illustration"> with a p element, which is more appropriate. The value of the class attribute may depend on the caption category in the ODF file.