Bug 147392 - eBooks: improve html output (e.g. minimize number of tags and style declarations)
Summary: eBooks: improve html output (e.g. minimize number of tags and style declarati...
Status: RESOLVED DUPLICATE of bug 141187
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: EPUB-Export
  Show dependency treegraph
 
Reported: 2022-02-12 17:23 UTC by R. Green
Modified: 2022-03-13 15:45 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Writer file used to generate an epub doc. (24.49 KB, application/vnd.oasis.opendocument.text)
2022-02-12 17:23 UTC, R. Green
Details

Note You need to log in before you can comment on or make changes to this bug.
Description R. Green 2022-02-12 17:23:08 UTC
Created attachment 178240 [details]
Writer file used to generate an epub doc.

Version: 7.1.5.2 / LibreOffice Community
Build ID: 85f04e9f809797b8199d13c421bd8a2b025d52b5
CPU threads: 2; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-GB (en_GB.UTF-8); UI: en-GB
Calc: threaded

ISSUE

Although LO Writer is capable of producing epub books, the html code seems to be inefficient, with multiple classes used for the same type of paragraph; headings changed to paragraphs; and span tags used to indicate italics or emphasis in text (rather than the appropriate <i></i> or <em></em> tags.

TO DEMONSTRATE

1. Open the attached Writer file. Create an epub.
2. Open the epub with the CALIBRE Editor and inspect the html / css.

Points to note:

1. Headings are translated to paragraph styles. Why not simply translate a heading into the html equivalent: h1, h2, h3 etc.?
2. The text is littered with unnecessary span tags. Why not simply use <i></i> and <em></em> tags?
3. There is an empty span style in the style sheet!
4. There are two paragraph styles in the css where only one exists in the Writer file.

Could the html be simplified?
Comment 1 Dieter 2022-03-11 08:44:09 UTC
Thank you for reporting the bug. It seems you're using an old version of LibreOffice. Could you please try to reproduce it with the latest version of LibreOffice from https://www.libreoffice.org/download/libreoffice-fresh/ ? I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' if the bug is still present in the latest version. Change to RESOLVED WORKSFORME, if the problem went away.
Comment 2 R. Green 2022-03-11 11:02:14 UTC
Version: 7.3.1.3 / LibreOffice Community
Build ID: a69ca51ded25f3eefd52d7bf9a5fad8c90b87951
CPU threads: 2; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-GB (en_GB.UTF-8); UI: en-GB
Calc: threaded

No. Still the same with the latest fresh version.
Comment 3 Dieter 2022-03-13 10:27:34 UTC
I'm not an expert in html, but please check, if we can treat your report as a duplicate of bug 141187 or bug 115377.
Comment 4 R. Green 2022-03-13 14:01:27 UTC
Yes, I'd agree; Bug 141187 (LO produces messy HTML in EPUB export) is basically covering the same ground: prevent unncessary duplication of html tags and CSS classes.

IMV, Bug 115377 could be merged into Bug 141187 as it's in the same general area.
Comment 5 Dieter 2022-03-13 15:45:00 UTC

*** This bug has been marked as a duplicate of bug 141187 ***