Bug 163105 - In writer, not every kashida opportunity is used in justified text
Summary: In writer, not every kashida opportunity is used in justified text
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Jonathan Clark
URL:
Whiteboard: target:25.2.0
Keywords:
Depends on:
Blocks: Kashida-Justification, Tatweel
  Show dependency treegraph
 
Reported: 2024-09-23 13:16 UTC by Hossein
Modified: 2024-09-27 01:05 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Text with kashida opportunities (14.52 KB, application/vnd.oasis.opendocument.text)
2024-09-23 13:16 UTC, Hossein
Details
MS Word display (PNG) (70.27 KB, image/png)
2024-09-23 14:48 UTC, Hossein
Details
LibreOffice display (JPG) (108.07 KB, image/jpeg)
2024-09-23 14:51 UTC, Hossein
Details
OpenOffice.org display (PNG) (46.11 KB, image/png)
2024-09-24 12:24 UTC, Hossein
Details
LibreOffice display with patch (34.23 KB, image/png)
2024-09-25 05:23 UTC, Jonathan Clark
Details
Example text before changes (51.40 KB, image/png)
2024-09-27 00:51 UTC, Jonathan Clark
Details
Example text after changes (58.29 KB, image/png)
2024-09-27 00:51 UTC, Jonathan Clark
Details
Generated text before changes (117.20 KB, image/png)
2024-09-27 00:52 UTC, Jonathan Clark
Details
Generated text after changes (131.31 KB, image/png)
2024-09-27 00:52 UTC, Jonathan Clark
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hossein 2024-09-23 13:16:06 UTC
Created attachment 196627 [details]
Text with kashida opportunities

Description:
In Arabic/Persian text, there are many opportunities to use kashida. When justifying a paragraph, in many cases it is possible to use kashdia to fill the line. Specially, when manual line break is used.
As an example, in بب and then a line break, it is possible to use kashida, like بـــب. But it is not used.
Also, even in situations when kashida is not joined in the end, it is possible to use kashida. as an example, بء can be extended to بــــــــــــء, which is OK, and is used in some cases. For example, to demonstrate diacritics on top of kashida.

The situation is font-specific, which is also an issue. I have used "Noto Sans Arabic" and "Vazirmatn" in the text. Both have issues, but "Noto Sans Arabic" is slightly better.

Steps to reproduce:
1. Open the attachment.

Actual Results:
Some of the lines are not filled, as LibreOffice Writer does not use the opportunity to use kashida.

Expected Results:
All the lines should filled using kashida, which is a horizontal line.

Reproducible: Always


User Profile Reset: No


Additional Info:
Reproducible with the latest LO 25.2 dev master
Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 8733ffc6a4e79e007ec9725a183ef04a7fbf0fb6
CPU threads: 12; OS: Linux 6.2; UI render: default; VCL: x11
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: CL threaded
Comment 1 Hossein 2024-09-23 14:48:01 UTC
Created attachment 196629 [details]
MS Word display (PNG)

To do a comparison, you can open the attachment in MS Word. But, as tdf#155707 is not implemented, you have to use "justify low" for the paragraph alignment in Word. The current justified text in LibreOffice is mostly like "justify low".
Word does not use kashida when you set the paragraph alignment to simple "justify".
Comment 2 Hossein 2024-09-23 14:51:25 UTC
Created attachment 196630 [details]
LibreOffice display (JPG)

This is the display in LibreOffice 25.2 dev master. As you can see, many lines are not using kashida, but they should.
Comment 3 Jonathan Clark 2024-09-23 17:38:18 UTC
I can reproduce this locally; confirmed.

General notes:

LibreOffice currently has some residual code to choose kashida insertion positions. Those positions are then validated against HarfBuzz. Taking the product of both is probably doing more harm than good. Unless there's a clear reason we should exclude a position, we should trust that HarfBuzz is choosing them correctly.

For lines that do not have kashida insertion opportunities, but do contain whitespace, we are supposed to fall back to whitespace expansion justification (Latin justification). This does not happen in the example document.

It's likely that these issues will affect both the Writer and Edit Engine implementations of kashida justification. Both need to be investigated, and both should be updated to produce similar results.
Comment 4 Hossein 2024-09-24 12:24:48 UTC
Created attachment 196647 [details]
OpenOffice.org display (PNG)

Old OpenOffice.org 3.2.1 produces acceptable display although it differs slightly with what Word renders.

OpenOffice.org 3.2.1
OOO320m18 (Build:9502)
Comment 5 Commit Notification 2024-09-24 23:25:45 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/fe4687ed174c54f2eb25f8088bf3fb6cb4858175

tdf#163105 Consolidated duplicated kashida justification code

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 Commit Notification 2024-09-25 01:23:02 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/41671d732ad933175779b61159b56824ff77b2fe

tdf#163105 sw: Add some whitespace expansion to kashida justification

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Commit Notification 2024-09-25 01:55:08 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/a3b0ef4088183c4a3b2ec3fef08ef91314eaef54

tdf#163105 Restore some missing kashida opportunities

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Commit Notification 2024-09-25 05:15:35 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/5b494880e86dfdb702f9cda807253fa575814d6f

tdf#163105 Require kashida after Seen, even before a final Yeh

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Jonathan Clark 2024-09-25 05:23:35 UTC
Created attachment 196669 [details]
LibreOffice display with patch

Screenshot of the test document with the above patches applied, for comparison.
Comment 10 Hossein 2024-09-25 11:13:09 UTC
(In reply to Jonathan Clark from comment #9)
> Created attachment 196669 [details]
> LibreOffice display with patch
> 
> Screenshot of the test document with the above patches applied, for
> comparison.
Thanks Jonathan for the fixes.
The screenshot that you attached seem to match what Word renders, and even with some improvements, including avoiding multiple kashidas in the same word.
But:
1. I need to test more, to make sure about the different use cases of kashida.
2. For the second paragraph with Vazirmatn font, I do not get the same result. In my tests, with that font the problem is still there.
3. Looking into tdf#155707 makes sense. The behavior of the kashida insertion in LibreOffice has now shifted from something similar to "Justify medium/high" in Word to something like "Justify low" there.
4. Previously, sometimes during the insertion of manual line breaks, text was slightly passing the left margin. Might be unrelated, and I'll file another issue for it.
Comment 11 Jonathan Clark 2024-09-25 21:28:39 UTC
(In reply to Hossein from comment #10)
> 1. I need to test more, to make sure about the different use cases of
> kashida.

This would be very helpful. Also, any effort to determine whether we need to change these rules (maybe even on a language-specific basis) would be helpful.

My intention for this bug is to make the existing insertion rules behave as they were intended. Most of the work on this bug is refactoring the code to reduce duplication and improve testing. It's going to be much easier to experiment with kashida justification rules now.

> 2. For the second paragraph with Vazirmatn font, I do not get the same
> result. In my tests, with that font the problem is still there.

This is expected, and is the major outstanding task.

Currently, invalid kashida positions are filtered out *after* selection. Writer and Edit Engine need to be updated to instead pass some HarfBuzz data to the position selection algorithm.

> 4. Previously, sometimes during the insertion of manual line breaks, text
> was slightly passing the left margin. Might be unrelated, and I'll file
> another issue for it.

This sounds familiar. I may have already fixed this as a side-effect.
Comment 12 Commit Notification 2024-09-26 20:45:07 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/d8f430e4bef414616fd80bbf4ea16d767991b5b9

tdf#163105 Use HB data while selecting kashida insertion positions

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 13 Jonathan Clark 2024-09-27 00:51:03 UTC
Created attachment 196734 [details]
Example text before changes

Shows an example passage from the test document in two different fonts, Noto Sans Arabic and Amiri, rendered without these changes.
Comment 14 Jonathan Clark 2024-09-27 00:51:40 UTC
Created attachment 196735 [details]
Example text after changes

Shows an example passage from the test document in two different fonts, Noto Sans Arabic and Amiri, rendered with these changes.
Comment 15 Jonathan Clark 2024-09-27 00:52:38 UTC
Created attachment 196736 [details]
Generated text before changes

Generated random text using the Amiri font, without these changes.
Comment 16 Jonathan Clark 2024-09-27 00:52:57 UTC
Created attachment 196737 [details]
Generated text after changes

Generated random text using the Amiri font, with these changes.
Comment 17 Jonathan Clark 2024-09-27 01:05:25 UTC
Combined, these changes resolve the core issue of this report: Previously, we were inappropriately dropping kashida from words, even if there were reasonable places to insert them. This shouldn't happen anymore.

It's possible that we will need to make further adjustments to kashida justification. It would be best to file separate bugs for those changes, as they are needed.