Bug 161397 - Issues with copy & pasting Arabic/Persian text with colored characters
Summary: Issues with copy & pasting Arabic/Persian text with colored characters
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
24.8.0.0 alpha1+
Hardware: All All
: medium normal
Assignee: Jonathan Clark
URL:
Whiteboard: target:24.8.0
Keywords: bibisectNotNeeded, regression
Depends on:
Blocks: RTL-Arabic-and-Farsi Paste-Special-Unformatted
  Show dependency treegraph
 
Reported: 2024-06-03 15:54 UTC by Hossein
Modified: 2024-06-06 12:54 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Arabic text with colored characters (18.31 KB, application/vnd.oasis.opendocument.text)
2024-06-03 15:54 UTC, Hossein
Details
Example of the document with text copied from first attachment (9.71 KB, application/vnd.oasis.opendocument.text)
2024-06-03 15:58 UTC, Hossein
Details
PDF output from LibreOffice 24.2 (10.81 KB, application/pdf)
2024-06-03 16:07 UTC, Hossein
Details
PDF output from LibreOffice 24.8 dev master (11.07 KB, application/pdf)
2024-06-03 16:38 UTC, Hossein
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hossein 2024-06-03 15:54:48 UTC
Created attachment 194518 [details]
Arabic text with colored characters

Description:
After the recent fixes for tdf#61444, copy & pasting text with colored characters as unformatted text elsewhere, creates garbage rectangles for unknown characters.

Bug 61444 - Text layout broken across formatting changes (color, underline, etc.)

Steps to Reproduce:
You need latest build from LO 24.8 dev master sources.

1. Open attachment
2. Copy line 1
3. Open a new Writer document
4. Press ctrl+shift+v, and select "Unformatted text"

Alternatively, you can open the second attachment, which I will upload shortly.

Actual Results:
Text appears with some garbage rectangles.

Note: choosing specific fonts like "Amiri", "Noto Sans Arabic" makes the rectangles disappear.

Expected Results:
Text should appear as normal Arabic text, without garbage rectangles. The fallback font is sometimes a very bad selection. For example, Lohit Devanagari with language set to Hindi in "Character > Font > Complex". But, even in this situation, fallback font should be used to provide needed characters.

Those garbage rectangles are also visible in the PDF output. Therefore, this is not a display-only problem.


Reproducible: Always


User Profile Reset: Yes


Additional Info:
Reproducible with the latest LO 24.8 dev master:
Version: 24.8.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: fc37066963a866eeb342b3a41b916f2574f5de28
CPU threads: 12; OS: Linux 6.2; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: CL threaded

Not reproducible with LO 24.2
Version: 24.2.2.2 (X86_64) / LibreOffice Community
Build ID: d56cc158d8a96260b836f100ef4b4ef25d6f1a01
CPU threads: 12; OS: Linux 6.2; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 1 Hossein 2024-06-03 15:58:46 UTC
Created attachment 194519 [details]
Example of the document with text copied from first attachment

Please open the attachment with:

1. Latest LO 24.8 dev master
2. LO 24.2 binaries from libreoffice.org

Should be fine in 2, but the garbage rectangles are visible in 1.
Comment 2 Hossein 2024-06-03 16:07:50 UTC
Created attachment 194520 [details]
PDF output from LibreOffice 24.2

This is the PDF output from LibreOffice 24.2, which looks fine.

Font: Lohit Devanagari
Fallback Arabic Font: FreeSerif
Comment 3 Hossein 2024-06-03 16:38:04 UTC
Created attachment 194522 [details]
PDF output from LibreOffice 24.8 dev master

This is the PDF output from LibreOffice 24.8 dev master, which contains garbage rectangles. Please not that the same fallback font is used.

Font: Lohit Devanagari
Fallback Arabic Font: FreeSerif
Comment 4 raal 2024-06-03 17:02:53 UTC
Confirm with Version: 24.8.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: c497cd602d543b48888212f79ba1ecf378e415fc
CPU threads: 4; OS: Linux 6.5; UI render: default; VCL: gtk3
Locale: cs-CZ (cs_CZ.UTF-8); UI: en-US
Calc: threaded Jumbo
Comment 5 Jonathan Clark 2024-06-04 06:55:13 UTC
This is a regression due to the following commit:

commit 0b6a07f07dd05d0db4ddeedb9b112e26b5fd5eb5 (HEAD)
Author: Jonathan Clark <jonathan@libreoffice.org>
Date:   Tue May 28 17:27:19 2024 -0600

    tdf#81272 Improved CJK fallback rendering performance
Comment 6 Commit Notification 2024-06-06 10:09:48 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/0eedac9d666659a0e4b4892cff36a735db10c81f

tdf#161397 Fix incorrect glyphs for RTL font fallback

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 Jonathan Clark 2024-06-06 10:23:46 UTC
With this change, following the above steps now produces the correct output. The attached documents also display the correct output.

Currently, it seems difficult to create reliable integration tests that cover font fallback. It's not a perfect substitute, but I added unit tests covering fallback run construction. Hopefully this will serve as a warning if this part of the code is changed again in the future.
Comment 8 Hossein 2024-06-06 12:54:27 UTC
(In reply to Jonathan Clark from comment #7)
> With this change, following the above steps now produces the correct output.
> The attached documents also display the correct output.
Thanks Jonathan for the fix.
I can verify the fix with the latest LO 24.8 dev master:

Version: 24.8.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: 0eedac9d666659a0e4b4892cff36a735db10c81f
CPU threads: 12; OS: Linux 6.2; UI render: default; VCL: x11
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: CL threaded

> Currently, it seems difficult to create reliable integration tests that
> cover font fallback. It's not a perfect substitute, but I added unit tests
> covering fallback run construction. Hopefully this will serve as a warning
> if this part of the code is changed again in the future.
First of all, the selection of fallback font and language for the characters is bad. That is (partially) because the default assigned fonts are not good. "Lohit Devanagari" is not suitable at all for Arabic/Persian text.
When good fonts are available that contain all the required glyphs, one of the worst options that lacks those glyphs is selected. To fix the problems with the fallback font, selection of fallback fonts should be improved according to the locale/language/keyboard. I will file a ticket for it.