Bug 161367 - Excessive generation of <SPAN> tags in EPUB export
Summary: Excessive generation of <SPAN> tags in EPUB export
Status: RESOLVED DUPLICATE of bug 141187
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-05-31 19:32 UTC by Phil Stracchino
Modified: 2024-06-01 07:42 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample paragraph, ODT version (54.66 KB, application/vnd.oasis.opendocument.text)
2024-06-01 01:07 UTC, Phil Stracchino
Details
EPUB export of same sample (2.60 KB, application/epub+zip)
2024-06-01 01:08 UTC, Phil Stracchino
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Phil Stracchino 2024-05-31 19:32:05 UTC
Exporting from .ODT to .EPUB format generates unnecessarily voluminous XHTML that is difficult to read and edit, because it contains vast numbers of redundant <span>s with identical properties.

For example, the following paragraph, which contains no formatting except for the rendering of 82nd:

“One in particular, a Ranger, uh, force known as the 82nd Airborne, had a particular nickname, and a specific song that they took as their own, that invoked that nickname.  The rhyme of the song does not work in Saamen, but the part I can remember of the song goes like this:

Results in the following XHTML:

  <p class="para12"><span class="span15">“One in particular, </span><span class="span15">a Ranger, uh, force </span><span class="span15">known as </span><span class="span15">the 82</span><span class="span41">nd</span><span class="span15"> Airborne, had a particular nickname, and a specific song that they took as their own, that invoked that nickname. </span><span class="span15"> </span><span class="span15">The rhyme of the song does not work in Saamen, but the </span><span class="span15">part I can remember of the </span><span class="span15">song goes like this:</span></p>

When what it SHOULD produce is this:

  <p class="para12"><span class="span15">“One in particular, a Ranger, uh, force known as the 82</span><span class="span41">nd</span><span class="span15"> Airborne, had a particular nickname, and a specific song that they took as their own, that invoked that nickname.  The rhyme of the song does not work in Saamen, but the part I can remember of the song goes like this:</span></p>

No less than SEVEN TIMES in that one paragraph, LibreOffice *closes* a span of class span15 only to immediately begin a new span *also* of class span15.  I can find no clear reason why it is generating so many redundant spans.  My hypothesis would be that it is because the source .ODT document ITSELF contains many such redundant and unnecessary duplicated formatting codes.

This is wasteful and unnecessary, and results in XHTML documents much larger than they need to be, that probably also take much longer to *render* than the need to.  It should probably be considered malformed.

LibreOffice should automatically collapse adjacent spans (and its own formatting regions) of the same type.  Currently I have to have a custom Perl script to perform this cleanup.  The resulting reduction in the uncompressed size of the XHTML files within the epub is as much as 30%.
Comment 1 m_a_riosv 2024-06-01 00:43:41 UTC
Please attach a sample file, reduce the size as much as possible without private information, and paste the information in Menu/Help/About LibreOffice, there is a copy icon.
Comment 2 Phil Stracchino 2024-06-01 01:07:39 UTC
Created attachment 194494 [details]
Sample paragraph, ODT version

Version: 7.6.4.1 (X86_64) / LibreOffice Community
Build ID: 60(Build:1)
CPU threads: 12; OS: Linux 6.8; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Gentoo official package
Calc: threaded
Comment 3 Phil Stracchino 2024-06-01 01:08:20 UTC
Created attachment 194495 [details]
EPUB export of same sample
Comment 4 Buovjaga 2024-06-01 07:42:57 UTC

*** This bug has been marked as a duplicate of bug 141187 ***