Bug 130357 - Justification doesn't work with certain whitespace characters
Summary: Justification doesn't work with certain whitespace characters
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Font-Rendering
  Show dependency treegraph
 
Reported: 2020-02-02 00:17 UTC by João Paulo
Modified: 2023-12-26 15:34 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample document showing the error (16.57 KB, application/vnd.oasis.opendocument.text)
2020-02-02 00:18 UTC, João Paulo
Details
Sample PDF document showing the error (11.96 KB, application/pdf)
2020-02-02 00:20 UTC, João Paulo
Details
Rendering of the sample ODT document on PDF X (124.98 KB, image/png)
2020-02-04 01:33 UTC, João Paulo
Details
Rendering of the sample ODT document on ONLYOFFICE Desktop Editors (105.42 KB, image/png)
2020-02-04 01:50 UTC, João Paulo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description João Paulo 2020-02-02 00:17:14 UTC
Description:
When using the normal SPACE character (U+0020), justification works correctly. For example: When a line (which is not the last of the justified paragraph) ends with a normal SPACE character, the whitespace from the last SPACE character is correctly discarded, all the whitespaces from the other SPACE characters are widened, and there is a line break.

However, when using other whitespace characters, such as EN SPACE (U+2002) or EM SPACE (U+2003), if it happens that this other kind of whitespace is the last character on a justified line, the whitespace is not discarded and there is a “hole” on the end of the line.

It happens like this:

This is a sentence  ending with a period  followed by a EM SPACE (U+2003).  This is a sentence
ending with a period followed by a EM SPACE (U+2003).  This is a sentence ending with a period
followed  by a EM SPACE (U+2003).  This is a sentence  ending  with a period  followed by a EM
SPACE (U+2003).  This is a sentence ending with a period followed by a EM SPACE (U+2003).     
This is a sentence  ending with a period  followed by a EM SPACE (U+2003).  This is a sentence
ending with a period followed by a EM SPACE (U+2003).  This is a sentence ending with a period
followed  by a EM SPACE (U+2003).  This is a sentence  ending  with a period  followed by a EM
SPACE (U+2003).  

When it should happen like this:

This is a sentence  ending with a period  followed by a EM SPACE (U+2003).  This is a sentence
ending with a period followed by a EM SPACE (U+2003).  This is a sentence ending with a period
followed  by a EM SPACE (U+2003).  This is a sentence  ending  with a period  followed by a EM
SPACE (U+2003).  This is a sentence ending with a period followed by a EM SPACE (U+2003). This
is a sentence ending with a period followed by a EM SPACE (U+2003).  This is a sentence ending
with a period  followed  by a EM  SPACE (U+2003).   This is a  sentence  ending  with a period
followed  by a EM SPACE (U+2003).  This is a sentence  ending  with a period  followed by a EM
SPACE (U+2003).  

There is a list of different whitespace characters at https://en.wikipedia.org/wiki/Template:Whitespace_(Unicode).

I think those different whitespace characters deserve a special case of justification:

-- All spaces that aren't wider than the normal SPACE (U+0020) or should have a fixed size (such as FIGURE SPACE and PUNCTUATION SPACE) shouldn't be widened on a justified paragraph, which means that:
-- **Only** the EM SPACE and EN SPACE (and its equivalents EM QUAD and EN QUAD) should be widened **proportionally wider than normal spaces are widened** on justified paragraphs.


Steps to Reproduce:
1. Format a paragraph as justified.
2. Type several sentences ending in periods.
3. After every period, instead of using a normal SPACE (U+0020) character, use another whitespace, such as EM SPACE (U+0023) or EN SPACE (U+0022). One easy way to encode then is typing its Unicode point number and pressing ALT+X right after it.

Actual Results:
If the non normal SPACE (U+0020) character is at the end of a justified line, it remains there as a “hole” on the text flow instead of being substituted by a new line.

Also, LibreOffice doesn't widen those whitespace characters proportionally to their size in relation to the normal SPACE character.

Expected Results:
LibreOffice shouldn't allow the other whitespaces characters to be “shown” as a hole at the end of line.

Also, LibreOffice should widen those whitespace characters proportionally to their size in relation to the normal SPACE character.


Reproducible: Always


User Profile Reset: No



Additional Info:
I attached an .ODT and a .PDF file as examples.
Comment 1 João Paulo 2020-02-02 00:18:20 UTC
Created attachment 157585 [details]
Sample document showing the error
Comment 2 João Paulo 2020-02-02 00:20:14 UTC
Created attachment 157587 [details]
Sample PDF document showing the error
Comment 3 V Stuart Foote 2020-02-02 16:52:06 UTC
@Khaled, is this all edit engine, or do Harfbuzz libs manage space redsitribution during justification?  Are we losing, or maybe generalizing, the width of the em, en, quad spaces?

And, assume strange things would happen in any case if working with a font that has missing coverage of spaces for the Unicode 'General Punctuation' block, so subject to vagaries of fallback handling.
Comment 4 João Paulo 2020-02-04 01:22:12 UTC
(In reply to V Stuart Foote from comment #3)
> @Khaled, is this all edit engine, or do Harfbuzz libs manage space
> redsitribution during justification?  Are we losing, or maybe generalizing,
> the width of the em, en, quad spaces?
> 
> And, assume strange things would happen in any case if working with a font
> that has missing coverage of spaces for the Unicode 'General Punctuation'
> block, so subject to vagaries of fallback handling.

I'm no developer, just a power user and system administrator who can do some advanced scripting on PowerShell, but I could see at https://harfbuzz.github.io/what-harfbuzz-doesnt-do.html that:

"HarfBuzz won't help you with line breaking, hyphenation, or justification. As mentioned above, HarfBuzz lays out the string along a single line of, notionally, infinite length. If you want to find out where the potential word, sentence and line break points are in your text, you could use the ICU library's break iterator functions."
Comment 5 João Paulo 2020-02-04 01:33:13 UTC
Created attachment 157625 [details]
Rendering of the sample ODT document on PDF X

PDF X is a freeware file viewer which opens several formats beyond PDF.  When using it to open the sample .ODT, it renders the justification correctly, even if it uses a draft font instead of the used font.

PDF X can be installed on Windows 10 from https://www.microsoft.com/store/productId/9P3CP9G025RM
Comment 6 João Paulo 2020-02-04 01:50:52 UTC
Created attachment 157626 [details]
Rendering of the sample ODT document on ONLYOFFICE Desktop Editors

Rendering of the sample ODT document on ONLYOFFICE Desktop Editors.

ONLYOFFICE Desktop Editors is a FOSS which can edit OpenDocument Format files and Microsoft Office files.  When using it to open the sample .ODT, it renders the justification correctly.

ONLYOFFICE Desktop Editors can be downloaded at https://www.onlyoffice.com/pt/download-desktop.aspx for Linux, Windows and macOS.
Comment 7 Buovjaga 2020-05-09 11:56:33 UTC
Confirmed with attachment 157585 [details]. Note that you need Carlito font to be able to see the problem.

This is also seen with older versions (4.4, 3.3), so nothing to do with HarfBuzz

Version: 7.0.0.0.alpha0+ (x64)
Build ID: 00db5933ded1884b2ac453552badae20fa943478
CPU threads: 4; OS: Windows 10.0 Build 18362; UI render: default; VCL: win; 
Locale: fi-FI (fi_FI); UI-Language: en-US
Calc: threaded
Comment 8 João Paulo 2020-09-27 08:42:27 UTC
(In reply to João Paulo from comment #0)
> There is a list of different whitespace characters at
> https://en.wikipedia.org/wiki/Template:Whitespace_(Unicode).
> 
> I think those different whitespace characters deserve a special case of
> justification:
> 
> -- All spaces that aren't wider than the normal SPACE (U+0020) or should
> have a fixed size (such as FIGURE SPACE and PUNCTUATION SPACE) shouldn't be
> widened on a justified paragraph, which means that:
> -- **Only** the EM SPACE and EN SPACE (and its equivalents EM QUAD and EN
> QUAD) should be widened **proportionally wider than normal spaces are
> widened** on justified paragraphs.
> 

Sorry, I entered the wrong Wikipedia page address with a list of whitespace characters.  The correct one is "https://en.wikipedia.org/wiki/Whitespace_character".

Also, I don't think anymore that EM SPACE, EN SPACE, EM QUAD and EN QUAD should be widened proportionally wider than normal spaces are widened on justified paragraphs.  Nor I do think that they shouldn't.  I'll leave that opinion to people with more typography expertise than me.  (Unless you want to add this choice of behavior to style formatting -- but using two or three NORMAL SPACES together on a justified paragraph should do the trick of making certain gaps wider than normal spaces).
Comment 9 BogdanB 2023-09-21 05:17:19 UTC
Also in
Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: a34dcd03254480927c403d904c0e754802d97b90
CPU threads: 4; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: ro-RO (ro_RO.UTF-8); UI: en-US
Calc: threaded