Bug 157562 - Arabic text with No-Width optional break (U+200B) does not apply OpenType font feature properly
Summary: Arabic text with No-Width optional break (U+200B) does not apply OpenType fon...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Formatting-Mark
  Show dependency treegraph
 
Reported: 2023-10-02 13:51 UTC by Lateef Shaikh
Modified: 2023-10-17 21:19 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Image showing bug (156.35 KB, image/png)
2023-10-02 14:03 UTC, Lateef Shaikh
Details
Character dialog box (28.90 KB, image/png)
2023-10-02 14:08 UTC, Lateef Shaikh
Details
sample ODT (9.68 KB, application/vnd.oasis.opendocument.text)
2023-10-16 15:06 UTC, Stéphane Guillou (stragu)
Details
U2060 (42.35 KB, image/png)
2023-10-17 20:33 UTC, Lateef Shaikh
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lateef Shaikh 2023-10-02 13:51:58 UTC
Description:
I am working on Arabic text that uses U+200B (No-Width optional break) for horizontally aligning marks. I am using Calibri font. The text works fine on Google Chrome but in Libre Writer it does not place the mark at proper position, which is typed after U+200B. 

Steps to Reproduce:
1. Open LibreWriter
2. Type: Meem (U0645) + Fatha (U064E) 
3. Go to Insert menu then Formatting Mark -> No-width Optional Break
4. Type: Superscript Alef (U0670) + Noon (U0646)

Actual Results:
Superscript Alef is placed at the baseline.

Expected Results:
Superscript Alef should come next to Fatha.


Reproducible: Always


User Profile Reset: No

Additional Info:
Calibri font has a feature to place the marks horizontally if U+200B is in between them. This behavior can be seen in Google Chrome and Notepad (Windows application), but it is not shown properly in LibreWriter.
Comment 1 Lateef Shaikh 2023-10-02 14:03:39 UTC
Created attachment 189957 [details]
Image showing bug
Comment 2 Lateef Shaikh 2023-10-02 14:08:24 UTC
Created attachment 189958 [details]
Character dialog box

Character dialog box shows the marks positioned properly.
Comment 3 Stéphane Guillou (stragu) 2023-10-16 15:06:22 UTC
Created attachment 190241 [details]
sample ODT

Thank you for the report, Lateef.
I tried creating a sample document.
If it doesn't match your description, could you please upload a new one and make mine obsolete?
Comment 4 Stéphane Guillou (stragu) 2023-10-16 15:08:53 UTC
Khaled, I thought you might find this interesting.

Note I have used the following font features to match Lateef's: Calibri:calt&dlig&liga (Not that it changed anything to the display.)

Tested in:

Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: e9374f74385d7dfe77d1902d3d82af20143bc775
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded
Comment 5 ⁨خالد حسني⁩ 2023-10-17 13:37:39 UTC
This is because U+200B is one of the formatting characters that Writer makes visible when View -> Field Shadings is checked, as the way this is implemented breaks text layout around such characters. It works in the font preview because it is rendered using a different code path than Writer’s.

But using U+200B here is a font-specific hack, the proper, Unicode-sanctioned way is to add a tatweel before the small alef, which should work with most fonts:

ٱلرَّحۡمَـٰن
Comment 6 Lateef Shaikh 2023-10-17 20:33:05 UTC
Thank you for your time and help with this. Actually I am trying different characters as alternate to the tatweel approach, because tatweel is more of a justification character and can only be used when we have this scenario between two connecting alphabets. When it comes to non connecting alphabets like Ra and alef then we can't use tatweel. Therefore I was looking for a space like character with zero width and allows joining.

U+200B breaks the word so after posting this bug I had a discussion with a friend who made me realize that it is not a good choice anyways.

Then I first found U+FEFF (ZWNBSP) but that is deprecated, the new character is Word Joiner (U+2060). It works fine in Chrome and in Libre (using Noto Naskh font).

So I think you can close this ticket.
Comment 7 Lateef Shaikh 2023-10-17 20:33:29 UTC
Created attachment 190270 [details]
U2060
Comment 8 ⁨خالد حسني⁩ 2023-10-17 21:19:15 UTC
(In reply to Lateef Shaikh from comment #6)
> Thank you for your time and help with this. Actually I am trying different
> characters as alternate to the tatweel approach, because tatweel is more of
> a justification character and can only be used when we have this scenario
> between two connecting alphabets. When it comes to non connecting alphabets
> like Ra and alef then we can't use tatweel. Therefore I was looking for a
> space like character with zero width and allows joining.

FWIW, Unicode suggests using tatweel between connected letters, and NBSP between unconnected ones. I can’t find where this is documented right now so you have to  take my word for it. This has the advantage of being a simple solution and works virtually with any font (and if it does not work properly, the fallback will still be acceptable).

The behavior reported here is still a bug regardless.