Bug 150285 (Kashida-Justification) - [META] Problems with Justified Arabic/Persian text
Summary: [META] Problems with Justified Arabic/Persian text
Status: NEW
Alias: Kashida-Justification
Product: LibreOffice
Classification: Unclassified
Component: graphics stack (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on: 140767 150286 151748 152048 35320 62751 64559 65344 65414 70775 87731 88976 103871 104921 105079 106309 106653 108604 112849 116344 117907 124109 127176 132121 137528 137530 139627 140011 144734 145647 146199 150710 151262 152734
Blocks: RTL-CTL Font-Rendering
  Show dependency treegraph
 
Reported: 2022-08-06 12:36 UTC by Hossein
Modified: 2022-12-30 14:32 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hossein 2022-08-06 12:36:26 UTC
This is a meta issue to track all the bugs related to justified text and also kashida related issues.

Feel free to add related bugs.

Description:

When writing Arabic/Persian text, and setting the paragraph to justified, kashida is inserted incorrectly. Sometimes, there are gaps in the text (usually when using diacritics), and sometimes there are black lines in the text, where it is not expected (bad positioning of kashidas)

Kashida
https://en.wikipedia.org/wiki/Kashida

Before version 5.3, LibreOffice used "three different layout systems; Uniscribe on Windows, Core Text on Mac OS X and HarfBuzz everywhere else". Justified Arabic text had problems, but most of them were fixed.

With LibreOffice 5.3, the text layout engine was changed to use HarfBuzz for laying out text on all platforms. See tdf#89870 and LibreOffice 5.3 release notes:
https://wiki.documentfoundation.org/ReleaseNotes/5.3#Text_Layout

After this change, new problem appeared in text rendering for RTL/CTL languages, and many of them are revealed when setting the RTL paragraphs as justified.

As described in tdf#104921, the current approach for Arabic text justification is brittle, and leads to several problems.
Comment 1 خالد حسني 2022-08-14 15:56:34 UTC
(In reply to Hossein from comment #0)
> With LibreOffice 5.3, the text layout engine was changed to use HarfBuzz for
> laying out text on all platforms. See tdf#89870 and LibreOffice 5.3 release
> notes:
> https://wiki.documentfoundation.org/ReleaseNotes/5.3#Text_Layout
> 
> After this change, new problem appeared in text rendering for RTL/CTL
> languages, and many of them are revealed when setting the RTL paragraphs as
> justified.

HarfBuzz takes a lot of blame in LibreOffice. The actual problem was that on Windows and macOS, LibreOffice was basically offloading the text layout to system libraries that essentially did everything.

On Linux, on the other hand, it had basically a homegrown, very simple, text layout engine with ICU as its lowest text shaping component. It was buggy, it was slow, and the ICU part LibreOffice was using eventually got deprecated and removed from ICU. The first HarfBuzz integration on Linux replaced the ICU part with HarfBuzz. HarfBuzz is rock solid and industry standard, but it is a very low level component and LibreOffice is doing a lot of work on top it, and that code was used on the most neglected platform in OpenOffice.org days, so it was buggy.

The 5.3 change generalized the Linux layout engine and used it for all platforms  (because it is the only cross-platform one of the 3). This made prominent a lot of bugs that have long existed but was overlooked because they existed only on Linux when using one of the so-called complex scripts (and of course introduced new ones due to the complexity of the code base and my limited understanding of it).

Now everyone is suffering equally and there is some poetic justice in this :) For example, text layout for so-called complex scripts has always been unbearably slow, but it became only a pressing matter when everyone suffered the same.