Bug 121449 - Headings not exported as <h1-10> elements in EPUB
Summary: Headings not exported as <h1-10> elements in EPUB
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected)
Hardware: All All
: medium normal
Assignee: Not Assigned
Depends on:
Blocks: EPUB-Export
  Show dependency treegraph
Reported: 2018-11-15 19:34 UTC by lbartolome
Modified: 2022-03-29 15:15 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:
Regression By:

Printscreens from epub and doc (2.57 MB, application/x-zip-compressed)
2018-11-15 19:38 UTC, lbartolome
Example ODT with headings (8.99 KB, application/vnd.oasis.opendocument.text)
2020-05-15 17:36 UTC, Buovjaga

Note You need to log in before you can comment on or make changes to this bug.
Description lbartolome 2018-11-15 19:34:19 UTC
I use Writer to create both pdfs for imprints and epubs. My current LO is 5.3.7 and I'm testing (x64). 
Since there's no option to export to pdf with trim marks and color bars, I created a pdf with them on another program from a page with a logo on it. I covered the logo with a white square before saving this and importing as bit map on page style.
LO 5 correctly imported it without the logo showing, but it wasn't so with LO 6.2, which also made the lines fuzzy.

Regarding the epub export, I am using Writer2Latex 1.6 extension (writer2html) in LO 5 and the LO6 default epub export filter. 

Neither of them created a cover for the book even though I selected an image in the LO 6.2 filter menu. 

LO 6.2 epub export filter only allows to select either new file at header or page break when it should be able to let you chose whether one or both and at which header level even if it's not adding additional options as W2L does.

LO 6.2 epub filter didn't export chapter numbers, which W2L handled correctly.

LO 6.2 epub filter added and F in front of the footnote number and added a line between notes, which make it look pretty weird on ADE. 

And LO 6.2 is not exporting style formating correctly, creating a lot of garbage code in the epub instead of a clean export. 
I'm copying the output from LO 6.2 epub export filter
<p class="para0"><span class="span0">Rom</span><span class="span0">a</span><span class="span0">. </span><span class="span0">(41158 palabras)</span></p>
<p class="para1">&#160;</p>
<p class="para1"><span class="span1">Nuestros lectores </span><span class="span1">desearán</span><span class="span1"> —al menos esa es nuestra esperanza</span><span class="span1">—</span><span class="span1"> </span><span class="span1">posponer</span><span class="span1"> por unos instantes la explicación que va a tener lugar entre P</span><span class="span1">è</span><span class="span1">trus y Regin</span><span class="span1">e</span><span class="span1">, a fin de seguir </span><span class="span1">en su</span><span class="span1"> peregrina</span><span class="span1">ción a</span><span class="span1"> uno de los héroes de esta historia, héroe abandonado </span><span class="span1">durante mucho</span><span class="span1"> tiempo y </span><span class="span1">en el</span><span class="span1"> que nos parece que </span><span class="span1">querían</span><span class="span1"> inter</span><span class="span1">e</span><span class="span1">s</span><span class="span1">arse</span><span class="span1">. </span></p>

Compare the same lines from W2L export
    <h1 id="toc0"><span class="SectionNumber">CAPÍTULO I  </span>Roma. (41158 palabras)</h1>
    <p class="Standard" lang="fr-FR" xml:lang="fr-FR">&#160;</p>
    <p class="Standard">Nuestros lectores desearán —al menos esa es nuestra esperanza— posponer por unos instantes la explicación que va a tener lugar entre Pètrus y Regine, a fin de seguir en su peregrinación a uno de los héroes de esta historia, héroe abandonado durante mucho tiempo y en el que nos parece que querían interesarse. </p>

You can see that LO epub export filter is not exporting language information, which is used by epub readers to show the right dictionary for the words.

Actual Results:
Messy epub
Fuzzy lines in bitmap

Expected Results:
Output should have been at least as good as in previous LO for both things:
1. Handling of pdf imports as bitmaps keeping quality
2. epub filter should have been worked from the best existing filter around (Writer2Latex's writer2html) to add up to it.
3. If the software had done trim/crop marks and color bars, I shouldn't have needed another program to create the bitmap.

Reproducible: Always

User Profile Reset: No

Additional Info:
Comment 1 lbartolome 2018-11-15 19:38:02 UTC
Created attachment 146672 [details]
Printscreens from epub and doc
Comment 2 Julien Nabet 2018-11-16 08:13:48 UTC
Mark/Miklos: thought you might be interested in this one since it concerns epub part.
Comment 3 lbartolome 2018-11-16 18:49:31 UTC
FWIW, the output from a document with text copied as plain text and then formatted is cleaner than the output from a working texts with deletions, additions and so on.

See example below:
Exported with LO 6.2 export as epub.
<p class="para8"><span class="span8">Alas de sangre</span></p>
<p class="para1"><span class="span7">&#160;</span><span class="span7">Elías Saavedra</span></p>
<p class="para9"><span class="span4">Una ira interior luchaba por salir sin que pudiera controlarla y él estaba dispuesto a hacer cualquier cosa con tal de hallar qué era lo que la originaba. Una sensación de rabia que no alcanzaba a comprender y que le reconcomía por dentro.</span></p>
<p class="para0"><span class="span4">Branadel había llegado hasta la ciudad

Exported with Writer2html (Writer2Latex)
    <h1 dir="ltr" id="toc1"><a id="RefHeadingToc3066392408007"></a>Alas de sangre</h1>
    <p class="Autor" dir="ltr">&#160;Elías Saavedra</p>
    <p class="Primerparrafo" dir="ltr">Una ira interior luchaba por salir sin que pudiera controlarla y él estaba dispuesto a hacer cualquier cosa con tal de hallar qué era lo que la originaba. Una sensación de rabia que no alcanzaba a comprender y que le reconcomía por dentro.</p>
    <p class="Standard" dir="ltr">Branadel había llegado hasta la ciudad

Please, note that class names in the second case have been inherited from parent document while they've been assigned by export filter, even for headers.

Versión: (x64)
Id. de compilación: ff46ad24d1d3cbcea45895520483ed1fd4ff488b
Subprocs. CPU: 2; SO: Windows 10.0; Repres. IU: predet.; VCL: win; 
Configuración regional: es-ES (es_ES); Calc: threaded
Comment 4 Xisco Faulí 2019-11-08 12:49:05 UTC
Hello Lbartolome.
A new major release of LibreOffice is available since this bug was reported.
Could you please try to reproduce it with the latest version of LibreOffice
from https://www.libreoffice.org/download/libreoffice-fresh/ ?
I have set the bug's status to 'NEEDINFO'. Please change it back to
'UNCONFIRMED' if the bug is still present in the latest version.
Comment 5 lbartolome 2019-11-12 18:45:59 UTC
Import of pdf as bitmap in page style is now correct on version 6.3 and transparencies are not showing.

epub output is as messy as before and not keeping styles. Heading is exporting as <p> +span class instead of <h> and spans are not being closed when changing class or closing tag (h, p...) increasing file size up to 10%. Copying and pasting as text and restyling before exporting is a workaround to get rid of the extra spans, but not viable with texts with hundreds of notes additional to the regular italics and headings.

<p class="para3"><span class="span2">En el cual el autor descorre el telón del teatro en que va a representarse su drama.</span></p>
<p class="para2">&#160;</p>
<p class="para2"><span class="span1">Si el lector quiere emprender conmigo una peregrinación hacia los días de mi juventud y retroceder a la mitad del curso de mi </span><span class="span1">vida</span><span class="span1">, haremos alto al principio del año de gracia de 1827 y diremos a las generaciones que datan de esta época lo que era París, física y moralmente considerado, en los </span><span class="span1">últimos</span><span class="span1"> años de la Restauración.</span></p>
<p class="para2"><span class="span1">Empezaremos por el aspecto físico de la moderna Babilonia. De Este al Oeste, pasando por el Sur, París en 1827 era poco </span><span class="span1">mas</span><span class="span1"> ó menos lo que es en 1854. El París de la ribera izquierda es naturalmente estacionario y tiende mas bien a despoblarse que a poblarse; al contrario de la </span><span class="span1">civilización</span><span class="span1"> </span><span class="span1">que</span><span class="span1"> camina de Oriente </span><span class="span1">a</span><span class="span1"> Occidente, París, esta </span><span class="span1">capital</span><span class="span1"> del mundo civilizado, marcha del Sur al Norte: Montrouge invade </span><span class="span1">a </span><span class="span1">Mont</span><span class="span1">m</span><span class="span1">artre.</span></p>
Comment 6 Buovjaga 2020-05-15 17:36:01 UTC
Created attachment 160875 [details]
Example ODT with headings

Export to EPUB and unzip.

section0001.xhtml has
<p class="para0"><span class="span0">Heading 1</span></p><p class="para1"><span class="span1">Heading 2</span></p><p class="para2"><span class="span2">Heading 3</span></p>

lbartolome: please do not mix several issues into a single report. This will now deal with the headings. If you are still worried about other issues, please create new reports for them.

Arch Linux 64-bit
Build ID: bdc8cd060dca8a97ef7970d1c0ab30694930beea
CPU threads: 8; OS: Linux 5.6; UI render: default; VCL: kf5; 
Locale: en-US (fi_FI.UTF-8); UI: en-US
Calc: threaded
Built on 14 May 2020
Comment 7 stragu 2021-05-19 11:38:43 UTC
Reproducible in LO 7.2 alpha1 as described in Comment 6

Version: / LibreOffice Community
Build ID: b1c0734ffe0f395757b6e0cea7830d820231afeb
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-05-18_03:16:20
Calc: threaded