Bug 163660 - CJK text following RTL Override mark formatted as RTL-CTL
Summary: CJK text following RTL Override mark formatted as RTL-CTL
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
7.5.4.2 release
Hardware: All All
: medium normal
Assignee: Jonathan Clark
URL:
Whiteboard: target:25.8.0 target:25.2.0.2
Keywords: text:cjk, text:ctl, text:rtl
Depends on:
Blocks: CJK RTL
  Show dependency treegraph
 
Reported: 2024-10-28 15:11 UTC by kelane.dez.neeman
Modified: 2024-12-24 11:01 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
The .odt file containing sample text with CJK/Asian characters set to RTL (13.54 KB, application/vnd.oasis.opendocument.text)
2024-10-28 15:14 UTC, kelane.dez.neeman
Details
A simple test document with explanatory text in English (16.75 KB, application/vnd.oasis.opendocument.text)
2024-10-28 23:41 UTC, Eyal Rozenberg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description kelane.dez.neeman 2024-10-28 15:11:27 UTC
Description:
Whenever a CJK character follows `U+202E` (`RIGHT-TO-LEFT OVERRIDE`) up until `U+202C` (`POP DIRECTIONAL FORMATTING`), then style-defined CJK/Asian font will not be applied, as it will then be treated as a Complex (CTL) instead of an Asian text.

Even though the character itself being script type `Hant`, LibreOffice Writer will refuse to apply any defined CJK/Asian font typeface from the active style, but rather forcing some random font upon the CJK character.  
  
The bug is worsened after updating to version `24.8.2` from `7.5.4.2`.     
  
Also, if I try to force direct formatting onto right-to-left CJK characters in LibreOffice Impress, the whole text gets messed up in protest, which does not happen in LibreOffice Writer.
  
Thank you!

Steps to Reproduce:
1. Enable Complex/CTL layout.
2. Ensure that Asian/CJK and Complex/CTL fonts for the specific style are different.
3. Using the same style, have an Asian or CJK character anywhere.
4. Before the first Asian or CJK character, insert the Unicode character `U+202E` (`RIGHT-TO-LEFT OVERRIDE`).
5. Insert `U+202C` (`POP DIRECTIONAL FORMATTING`) anywhere after the first CJK/Asian character.

Actual Results:
Every CJK/Asian character in between `U+202E` (`RIGHT-TO-LEFT OVERRIDE`) and `U+202C` (`POP DIRECTIONAL FORMATTING`) are now replaced with an unknown fallback font that cannot be changed or defined. All the CJK/Asian characters affected are now also considered Complex/CTL.

In version 24.8.2, the right-to-left CJK/Asian characters are even distorted/stretched horizontally.

Expected Results:
Every CJK/Asian character in between `U+202E` (`RIGHT-TO-LEFT OVERRIDE`) and `U+202C` (`POP DIRECTIONAL FORMATTING`) should still adopt automatically the style-defined Asian font and be treated as Asian text.


Reproducible: Always


User Profile Reset: No

Additional Info:
Version: 24.8.2.1 (X86_64) / LibreOffice Community
Build ID: 0f794b6e29741098670a3b95d60478a65d05ef13
CPU threads: 8; OS: Windows 10 X86_64 (10.0 build 19045); UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: CL threaded

## Sample Text (Entire Line Below)  
کناڤ 廣⁠府⁠話⁠ تق بوليه ݢونا ڤاکاي 東⁠亞⁠字⁠體⁠ يڠ سديا اد؟
  
## Additional Information  
- **Fonts used:**  
   - Western: *Arial*  
   - Asian: *DF-KaiSB*  
   - Complex: *Scheherazade New*  
- **Formatting characters used:**  
   - U+202E: `RIGHT-TO-LEFT OVERRIDE`  
   - U+202C: `POP DIRECTIONAL FORMATTING`  
   - U+2060: `WORD JOINER`  
   - U+200C: `ZERO WIDTH NON-JOINER`  
- **Languages:**  
   - All styles and all texts are set to [None]  
- **Operation System:**  
   - Windows 10 Home `22H2` `19045.5011`
   - 64-bit operating system  
   - `x64`-based processor  
- **LibreOffice Editions:**  
   - `24.8.2`  
   - `7.5.4.2`  
- **File Formats:**  
   - `.odt` (**main use**)  
   - `.odp` (not main use, but gets worse if I apply direct formatting)
Comment 1 kelane.dez.neeman 2024-10-28 15:14:18 UTC
Created attachment 197276 [details]
The .odt file containing sample text with CJK/Asian characters set to RTL
Comment 2 Eyal Rozenberg 2024-10-28 23:41:02 UTC
Created attachment 197285 [details]
A simple test document with explanatory text in English

In this document, there are only two characters in Chinese, one without the RLO mark, the other with - surrounded by explanatory text in English so you don't have to scratch your hand determining what exactly to look at, if you're not familiar with RTL or CJK languages.

Also, it distinguishes the font groups using font _size_, a large variation in which is much easiert to perceive than change of typeface (for people who, again, are not fluent in the language; and considering how some systems don't have various RTL/CJK fonts installed).

Anyway, confirming with:

Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: c8371b5f1a84191d38185820915f0d93741df1fe
CPU threads: 4; OS: Linux 6.6; UI render: default; VCL: gtk3
Locale: en-US (en_IL); UI: en-US
Comment 3 Jonathan Clark 2024-12-09 08:53:07 UTC
This bug is related to bug 66791.

As noted in Bug 66791 Comment 32, our script assignment algorithm assumes all RTL text is complex, regardless of content. As seen here, this is assumption is not always correct.
Comment 4 Eyal Rozenberg 2024-12-09 09:40:58 UTC
Have taken the liberty of shortening the title.
Comment 5 Commit Notification 2024-12-19 17:18:39 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/de29bec27e90a7d24a90c6f071e7899abefe683e

tdf#163660 sw: Treat strong CJK inside RTL runs as Asian script

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 Commit Notification 2024-12-24 11:01:15 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "libreoffice-25-2":

https://git.libreoffice.org/core/commit/7e8c979c01f7107d7cad5e85510880c61fd779f9

tdf#163660 sw: Treat strong CJK inside RTL runs as Asian script

It will be available in 25.2.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.