Bug 163857 - Export to EPUB results in truncated headings in EPUB's outline
Summary: Export to EPUB results in truncated headings in EPUB's outline
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
24.8.1.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: EPUB-Export
  Show dependency treegraph
 
Reported: 2024-11-12 09:50 UTC by mharel50
Modified: 2024-11-12 18:58 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Erroneous EPUB (327.39 KB, application/epub)
2024-11-12 09:56 UTC, mharel50
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mharel50 2024-11-12 09:50:01 UTC
Description:
I noticed that the latest versions of LibreOffice (24.8) and earlier don't work well in exporting to EPUB when there is more than one level of headings. I wrote a document with two level and had to divide to chapters by page breaks instead of headers due to this. Still, the division worked well, but then the generated table of contents for the EPUB was mangled, with headers cut in the middle.
Checking some further, I found that the text is saved and exported with to many HTML tags, most of them fully redundant. It looks like whenever I correct a typo or change even a single character, the editor adds </span><span> around the change, although the style and everything else stay the same. This inflates the file without adding any useful data and also comes in the way of the EPUB formatting.
I suspect that removing the redundant tags could help with the programs response to the user and also help in creating smaller and more efficient EPUB files.

Steps to Reproduce:
1.created a document with two or more levels of headers, each header preceded by a page break
2.Export to EPUB using 'divide by page breaks'
3.open in an EPUB viewer or editor - check generated table of contents
4. Use an EPUB editor to see the HTML in the text files

Actual Results:
Some heading are truncated (not all) on a large document (more than 100 pages)

Expected Results:
I expected all headers to show fully, even if flattened.


Reproducible: Always


User Profile Reset: No

Additional Info:
Can send actual files if needed
Noticed it on earlier versions but have no data.

Version: 24.8.2.1 (X86_64) / LibreOffice Community
Build ID: 0f794b6e29741098670a3b95d60478a65d05ef13
CPU threads: 8; OS: Windows 11 X86_64 (10.0 build 22631); UI render: Skia/Raster; VCL: win
Locale: en-GB (he_IL); UI: en-US
Calc: CL threaded
Comment 1 mharel50 2024-11-12 09:56:48 UTC
Created attachment 197561 [details]
Erroneous EPUB
Comment 2 Stéphane Guillou (stragu) 2024-11-12 13:14:15 UTC
Thank you for the report.

- The issue with headings other than h1 is tracked in bug 114164.
- The issue with truncated headings is mentioned in bug 121146 comment 7, but that bug is focused on export of a Table of Content.
- The issue with messy HTML is tracked in bug 141187.

I suggest focusing this report on the truncated outline headings, confirmed by bug 121146 comment 7.
Comment 3 Phil Stracchino 2024-11-12 18:58:06 UTC
So if I am understanding correctly what you're saying here, fundamentally the reason why LibreOffice generates fragmented/truncated entries for EPUB embedded tables of contents is because it cannot correctly read its own mangled XHTML.  TOC generation is broken by the very same redundant SPAN tags that it fills the generated XHTML text with.

My prediction based on this assumption would be that it works for tables of contents built from headings that you have never edited, but the instant you edit a heading, boom, that TOC entry will now be broken.