Bug 163196 - EPUB export mis-handles double spaces
Summary: EPUB export mis-handles double spaces
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: EPUB-Export
  Show dependency treegraph
 
Reported: 2024-09-28 18:26 UTC by Phil Stracchino
Modified: 2025-02-15 02:43 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Phil Stracchino 2024-09-28 18:26:56 UTC
Let's elide for now the entire argument over whether or not it is correct to put two spaces after a period, and just accept that a large number of people (myself included), as well as many academic studies, find that it improves readability.

Now, that said, when you use two spaces after a period, and then you export to EPUB, which is fundamentally containerized XHTML, those two spaces would be collapsed to one if left unmodified.  LibreOffice's EPUB export does ALMOST the right thing here, replacing <space>space> with <space><non-breaking-space>.

The key word here is ALMOST.  When a double space occurs at the end of a line, this replacement results in an erroneous single-character indent at the beginning of the next line,

The correct replacement would be <non-breaking-space><space>.  The WORST case here is that the erroneous space appears at the end of a line, where it is much less visible than at the beginning.  (For example, Sigil's XHTML preview mode.)  In the BEST case (for example, Amazon Kindle ebook reader), the ebook reader collapses it entirely and flows the text as intended.

This surely cannot be a difficult change.  (In fact,I'm going to look at the code myself, and see if I can FIND the relevant code and offer a patch.)
Comment 1 BogdanB 2024-12-28 13:42:41 UTC
I tested now. You can use Ctrl+Shift+Space twice, to insert a double space, and when exported as EPUB will remain as double space.

Please confirm it worked.
Comment 2 Phil Stracchino 2024-12-28 18:26:23 UTC
What actual character does Ctrl-Shift-Insert place?  Is that a forced non-breaking space?  I'm uncertain whether that's a good solution, or even a good workaround.  It puts the onus upon the user to work around something that the application should be doing correctly itself in the first place.  Making the user manually insert a double non-breaking space is the wrong way to solve the problem.  LibreOffice's current action of using a space plus a non-breaking space is the correct thing to do, it's just putting them in the wrong order.

If someone can tell me what source file I should be looking at, I'd be glad to look at it myself and offer a patch.
Comment 3 BogdanB 2024-12-28 20:07:15 UTC
Ok, let's wait for developers to evaluate this.
Comment 4 raal 2025-01-04 06:56:48 UTC
(In reply to Phil Stracchino from comment #2)

> If someone can tell me what source file I should be looking at, I'd be glad
> to look at it myself and offer a patch.

Hello, you can ask at developer's list: https://wiki.documentfoundation.org/Development/Mailing_List

CC to Ilmari.
Comment 5 Buovjaga 2025-01-04 08:44:52 UTC
Looks like the source is in writerperfect/source/writer but also using a library found in external/libepubgen. Library site: https://sourceforge.net/projects/libepubgen/
Comment 6 Phil Stracchino 2025-02-15 02:43:57 UTC
OK, first up:
Mishandling of double spaces is actually a VERY SMALL PART of everything that LibreOffice's EPUB export — or, rather, I should say, libepubgen — fouls up.  I operate a small publishing imprint, and it is actually LESS WORK to blow away ALL of the formatting in a LibreOffice-exported epub and reapply it all by hand than it is to attempt to clean up the exported EPUB's formatting.
(WHY does the exported EPUB need to be cleaned up post-export?  Because so much of it is non-compliant and wrong, and because it is a hideous un-editable mess.)


ELEPHANT IN THE MIDDLE OF THE ROOM:
libepubgen is unfinished v0.1.1 alpha-quality abandonware that hasn't seen a code commit in nearly three years now.


SUGGESTED SOLUTION:
Pandoc is pretty solid, although currently its ODT reader is buggy and loses formatting.  I suggest that LibreOffice abandon libepubgen and instead switch to using pandoc to generate EPUB exports.  The bugs in pandoc's ODT reader can be worked around, until they are fixed, by exporting a copy as DOCX and then converting that DOCX to EPUB.