102849 – Semantic XHTML conversion: heading root mismatch

Bug 102849 - Semantic XHTML conversion: heading root mismatch

Summary: Semantic XHTML conversion: heading root mismatch

Status:	RESOLVED WONTFIX

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	filters and storage (show other bugs)
Version: (earliest affected)	5.1.4.1 rc
Hardware:	All All

Importance:	medium normal
Assignee:	Samuel Mehrbrodt

URL:
Whiteboard:
Keywords:	accessibility

Depends on:
Blocks:

Reported:	2016-09-30 09:28 UTC by Laurent Godard
Modified:	2017-08-08 16:38 UTC (History)
CC List:	4 users (show)

See Also:
Crash report or crash signature:

Attachments
odt source file (8.53 KB, application/vnd.oasis.opendocument.text) 2016-09-30 09:29 UTC, Laurent Godard	Details
(x)html output (3.17 KB, text/html) 2016-09-30 09:29 UTC, Laurent Godard	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Laurent Godard 2016-09-30 09:28:24 UTC

The current LibreOffice behavior produces XHTML exports with an HTML
heading hierarchy not strictly and semantically corresponding to the
heading hierarchy of the ODF document.

Concretely LibreOffice produces XHTML exports with many <h1> headings.
While for an (X)HTML document to be semantically correct, there must be
one and only one <h1> per document. And the heading hierarchy should
then follow, <h2>, <h3>, etc. down maximum to <h6>.

The current LibreOffice behavior produces the following XHTML export of
the given ODT file:

    <p class="Title">Titre principal</p>

    <h1 class="Heading_20_1"><a id="a__Titre_1"><span/></a>Titre 1</h1>
    <p class="P1">para</p>

    <h2 class="Heading_20_2"><a id="a__Titre_2"><span/></a>Titre 2</h2>
    <p class="P1">para</p>

    <h3 class="Heading_20_3"><a id="a__Titre_3"><span/></a>Titre 3</h3>
    <p class="P1">para</p>

while we want to have the following:

    <h1>Titre principal</h1>

    <h2 class="whatever"><a id="a__Titre_1"><span/></a>Titre 1</h1>
    <p class="P1">para</p>

    <h3 class="whatever_else"><a id="a__Titre_2"><span/></a>Titre 2</h3>
    <p class="P1">para</p>

    <h4 class="whatever_again"><a id="a__Titre_3"><span/></a>Titre 3</h4>
    <p class="P1">para</p>


Technical explanation of the current LibreOffice bug:

Title has child Heading1. Heading1 has child Heading2. Heading2 has
child Heading3. etc. And Title is the root of the ODF heading hierarchy.
Es gibt:
Title→Heading1→Heading2→Heading3→etc.

As h1 is the root of the (X)HTML hierarchy, in XHTML the heading
hierarchy is:
h1→h2→h3→h4→etc.

The current LibreOffice behavior is the following:
Title→Heading1→Heading2→Heading3→etc.
=>
p→h1→h2→h3→etc.

while it should logically and semantically be:
h1→h2→h3→h4→etc.

This is a heading root mismatch between ODF and (X)HTML export

Changing this behaviour could be implemented as a filterOption that would postprocess the current output

Comment 1 Laurent Godard 2016-09-30 09:29:16 UTC

Created attachment 127740 [details]
odt source file

Comment 2 Laurent Godard 2016-09-30 09:29:57 UTC

Created attachment 127741 [details]
(x)html output

Comment 3 Julien Nabet 2016-10-01 22:32:10 UTC

Code pointer:
filter/source/xslt/odf2xhtml/export/xhtml/body.xsl

By doing some tests, "Titre principal" uses this part:
http://opengrok.libreoffice.org/xref/core/filter/source/xslt/odf2xhtml/export/xhtml/body.xsl#2803

The good headers this one:
http://opengrok.libreoffice.org/xref/core/filter/source/xslt/odf2xhtml/export/xhtml/body.xsl#1204

headers are created when matching text:h
1174         <xsl:template match="text:h">

unzipping odt and reformating it, we got this:
      <text:p text:style-name="Title">Titre principal</text:p>
      <text:p text:style-name="P1" />
      <text:h text:style-name="Heading_20_1" text:outline-level="1">Titre 1</text:h>
      <text:p text:style-name="P1" />
      <text:p text:style-name="P1">para</text:p>
      <text:p text:style-name="P1" />
      <text:h text:style-name="Heading_20_2" text:outline-level="2">Titre 2</text:h>
      <text:p text:style-name="P1" />
      <text:p text:style-name="P1">para</text:p>
...
Notice "text:p" for "Titre principal"

Just my 2 cents because I know too few about xsl.
Anyway, I confirm this on pc Debian x86-64 with master sources updated today.

Comment 4 Marc-Aurèle DARCHE 2016-10-03 19:25:39 UTC

The desired behavior (improved semantic and accessibility) presented by Laurent Godard is backed by the following W3C document:
"Using h1-h6 to identify headings"
https://www.w3.org/TR/WCAG20-TECHS/H42.html

Comment 5 Thorsten Behrens (allotropia) 2017-08-08 16:38:40 UTC

Superseded by:

Bug 111492 and Bug 111493