When an OpenDocument Text (file) is exported to XHTML, the exported code does not contain lang attributes that identify the document's default language or the language changes inside the document.
Steps to reproduce the issue:
1. Create a new Writer document and insert some text in English.
2. Add a paragraph in French (e.g. copy something from fr.wikipedia.org).
3. Go to File > Export > and choose XHTML.
4. Inspect the exported XHTML file in a source code editor and search for 'lang="'.
What the XHTML *should* have is:
1. lang="en" (possibly lang="en-US" or lang="en-GB", depending on the language specified for the Writer document) on the HTML element;
2. lang="..." on elements where the language changes compared to the immediate context (i.e. nearest ancestor).
* xml:lang is also in use, but is not supported by screen readers or software for dyslexics; screen readers are used by blind users to convert content to synthetic speech and/or Braille, and correct language identification is essential for both synthetic speech and Braille.
* Using Dublin Core metadata (e.g. <meta name="DCTERMS.language" content="en-US"...) specifies the expected audience language, but not the text processing language.
* <http://www.w3.org/International/tutorials/language-decl/#Slide0140>: "Declaring the text-processing language" (in W3C tutorial);
* WCAG 2.0 technique H57: Using language attributes on the html element: <http://www.w3.org/TR/2010/NOTE-WCAG20-TECHS-20101014/H57>
* WCAG 2.0 technique H58: Using language attributes to identify changes in the human language: <http://www.w3.org/TR/2010/NOTE-WCAG20-TECHS-20101014/H58.html>
Added dependency on Bug 39937 because the XSLT for XHTML export assumes that a dc:language element exists.
[This is an automated message.]
This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it
started right out as NEW without ever being explicitly confirmed. The bug is
changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back
to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases.
Details on how to test the 3.5.0 beta1 can be found at:
more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html
reproduced in LibO 3.5.0 beta 1
Confirmed in 7.2 Alpha, although the situation seems to have slightly improved:
There is a lang tag in the HTML tag at the top, but there isn't anything for the body or specific paragraphs.
Note that simply copying and pasting from FR Wikipedia did not attribute the French language to the paragraph in LibreOffice: I had to manually select the text and make it French by using the language menu in the status bar.
I can confirm also that using "Save As > HTML" does use lang tags for both body and specific paragraphs.
Version: 184.108.40.206.alpha0+ / LibreOffice Community
Build ID: 6b09276d157abada74e1a4989700139167207778
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-05-14_04:32:30