Bug 39795 - ACCESSIBILITY: Writer XHTML export loses language information [accessibility]
Summary: ACCESSIBILITY: Writer XHTML export loses language information [accessibility]
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: Other All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: accessibility
Depends on: 39937
Blocks: a11y, Accessibility (X)HTML-Export
  Show dependency treegraph
 
Reported: 2011-08-03 03:18 UTC by Christophe Strobbe
Modified: 2023-08-16 20:34 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christophe Strobbe 2011-08-03 03:18:38 UTC
When an OpenDocument Text (file) is exported to XHTML, the exported code does not contain lang attributes that identify the document's default language or the language changes inside the document.

Steps to reproduce the issue:
1. Create a new Writer document and insert some text in English.
2. Add a paragraph in French (e.g. copy something from fr.wikipedia.org).
3. Go to File > Export > and choose XHTML.
4. Inspect the exported XHTML file in a source code editor and search for 'lang="'.

What the XHTML *should* have is:
1. lang="en" (possibly lang="en-US" or lang="en-GB", depending on the language specified for the Writer document) on the HTML element;
2. lang="..." on elements where the language changes compared to the immediate context (i.e. nearest ancestor).

Notes:
* xml:lang is also in use, but is not supported by screen readers or software for dyslexics; screen readers are used by blind users to convert content to synthetic speech and/or Braille, and correct language identification is essential for both synthetic speech and Braille.
* Using Dublin Core metadata (e.g. <meta name="DCTERMS.language" content="en-US"...) specifies the expected audience language, but not the text processing language.

Background:
* <http://www.w3.org/International/tutorials/language-decl/#Slide0140>: "Declaring the text-processing language" (in W3C tutorial);
* WCAG 2.0 technique H57: Using language attributes on the html element: <http://www.w3.org/TR/2010/NOTE-WCAG20-TECHS-20101014/H57>
* WCAG 2.0 technique H58: Using language attributes to identify changes in the human language: <http://www.w3.org/TR/2010/NOTE-WCAG20-TECHS-20101014/H58.html>
Comment 1 Christophe Strobbe 2011-08-08 10:33:32 UTC
Added dependency on Bug 39937 because the XSLT for XHTML export assumes that a dc:language element exists.
Comment 2 Björn Michaelsen 2011-12-23 12:28:24 UTC Comment hidden (obsolete)
Comment 3 sasha.libreoffice 2012-01-08 21:33:47 UTC
reproduced in LibO 3.5.0 beta 1
Comment 4 Stéphane Guillou (stragu) 2021-05-18 06:29:36 UTC
Confirmed in 7.2 Alpha, although the situation seems to have slightly improved:

There is a lang tag in the HTML tag at the top, but there isn't anything for the body or specific paragraphs.

Note that simply copying and pasting from FR Wikipedia did not attribute the French language to the paragraph in LibreOffice: I had to manually select the text and make it French by using the language menu in the status bar.

I can confirm also that using "Save As > HTML" does use lang tags for both body and specific paragraphs.

Version: 7.2.0.0.alpha0+ / LibreOffice Community
Build ID: 6b09276d157abada74e1a4989700139167207778
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-05-14_04:32:30
Calc: threaded