Bug 102849 - Semantic XHTML conversion: heading root mismatch
Summary: Semantic XHTML conversion: heading root mismatch
Status: RESOLVED WONTFIX
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
5.1.4.1 rc
Hardware: All All
: medium normal
Assignee: Samuel Mehrbrodt (allotropia)
URL:
Whiteboard:
Keywords: accessibility
Depends on:
Blocks:
 
Reported: 2016-09-30 09:28 UTC by Laurent Godard
Modified: 2017-08-08 16:38 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
odt source file (8.53 KB, application/vnd.oasis.opendocument.text)
2016-09-30 09:29 UTC, Laurent Godard
Details
(x)html output (3.17 KB, text/html)
2016-09-30 09:29 UTC, Laurent Godard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Laurent Godard 2016-09-30 09:28:24 UTC
The current LibreOffice behavior produces XHTML exports with an HTML
heading hierarchy not strictly and semantically corresponding to the
heading hierarchy of the ODF document.

Concretely LibreOffice produces XHTML exports with many <h1> headings.
While for an (X)HTML document to be semantically correct, there must be
one and only one <h1> per document. And the heading hierarchy should
then follow, <h2>, <h3>, etc. down maximum to <h6>.

The current LibreOffice behavior produces the following XHTML export of
the given ODT file:

    <p class="Title">Titre principal</p>

    <h1 class="Heading_20_1"><a id="a__Titre_1"><span/></a>Titre 1</h1>
    <p class="P1">para</p>

    <h2 class="Heading_20_2"><a id="a__Titre_2"><span/></a>Titre 2</h2>
    <p class="P1">para</p>

    <h3 class="Heading_20_3"><a id="a__Titre_3"><span/></a>Titre 3</h3>
    <p class="P1">para</p>

while we want to have the following:

    <h1>Titre principal</h1>

    <h2 class="whatever"><a id="a__Titre_1"><span/></a>Titre 1</h1>
    <p class="P1">para</p>

    <h3 class="whatever_else"><a id="a__Titre_2"><span/></a>Titre 2</h3>
    <p class="P1">para</p>

    <h4 class="whatever_again"><a id="a__Titre_3"><span/></a>Titre 3</h4>
    <p class="P1">para</p>


Technical explanation of the current LibreOffice bug:

Title has child Heading1. Heading1 has child Heading2. Heading2 has
child Heading3. etc. And Title is the root of the ODF heading hierarchy.
Es gibt:
Title→Heading1→Heading2→Heading3→etc.

As h1 is the root of the (X)HTML hierarchy, in XHTML the heading
hierarchy is:
h1→h2→h3→h4→etc.

The current LibreOffice behavior is the following:
Title→Heading1→Heading2→Heading3→etc.
=>
p→h1→h2→h3→etc.

while it should logically and semantically be:
h1→h2→h3→h4→etc.

This is a heading root mismatch between ODF and (X)HTML export

Changing this behaviour could be implemented as a filterOption that would postprocess the current output
Comment 1 Laurent Godard 2016-09-30 09:29:16 UTC
Created attachment 127740 [details]
odt source file
Comment 2 Laurent Godard 2016-09-30 09:29:57 UTC
Created attachment 127741 [details]
(x)html output
Comment 3 Julien Nabet 2016-10-01 22:32:10 UTC
Code pointer:
filter/source/xslt/odf2xhtml/export/xhtml/body.xsl

By doing some tests, "Titre principal" uses this part:
http://opengrok.libreoffice.org/xref/core/filter/source/xslt/odf2xhtml/export/xhtml/body.xsl#2803

The good headers this one:
http://opengrok.libreoffice.org/xref/core/filter/source/xslt/odf2xhtml/export/xhtml/body.xsl#1204

headers are created when matching text:h
1174         <xsl:template match="text:h">

unzipping odt and reformating it, we got this:
      <text:p text:style-name="Title">Titre principal</text:p>
      <text:p text:style-name="P1" />
      <text:h text:style-name="Heading_20_1" text:outline-level="1">Titre 1</text:h>
      <text:p text:style-name="P1" />
      <text:p text:style-name="P1">para</text:p>
      <text:p text:style-name="P1" />
      <text:h text:style-name="Heading_20_2" text:outline-level="2">Titre 2</text:h>
      <text:p text:style-name="P1" />
      <text:p text:style-name="P1">para</text:p>
...
Notice "text:p" for "Titre principal"

Just my 2 cents because I know too few about xsl.
Anyway, I confirm this on pc Debian x86-64 with master sources updated today.
Comment 4 Marc-Aurèle DARCHE 2016-10-03 19:25:39 UTC
The desired behavior (improved semantic and accessibility) presented by Laurent Godard is backed by the following W3C document:
"Using h1-h6 to identify headings"
https://www.w3.org/TR/WCAG20-TECHS/H42.html
Comment 5 Thorsten Behrens (allotropia) 2017-08-08 16:38:40 UTC
Superseded by:

Bug 111492 and Bug 111493