Bug 157562

Summary: Arabic text with No-Width optional break (U+200B) does not apply OpenType font feature properly
Product: LibreOffice Reporter: Lateef Shaikh <lateef_sagar>
Component: WriterAssignee: Not Assigned <libreoffice-bugs>
Status: NEW ---    
Severity: normal CC: khaled, lateef_sagar, stephane.guillou
Priority: medium    
Version: Inherited From OOo   
Hardware: All   
OS: All   
Whiteboard:
Crash report or crash signature: Regression By:
Bug Depends on:    
Bug Blocks: 102345    
Attachments: Image showing bug
Character dialog box
sample ODT
U2060

Description Lateef Shaikh 2023-10-02 13:51:58 UTC
Description:
I am working on Arabic text that uses U+200B (No-Width optional break) for horizontally aligning marks. I am using Calibri font. The text works fine on Google Chrome but in Libre Writer it does not place the mark at proper position, which is typed after U+200B. 

Steps to Reproduce:
1. Open LibreWriter
2. Type: Meem (U0645) + Fatha (U064E) 
3. Go to Insert menu then Formatting Mark -> No-width Optional Break
4. Type: Superscript Alef (U0670) + Noon (U0646)

Actual Results:
Superscript Alef is placed at the baseline.

Expected Results:
Superscript Alef should come next to Fatha.


Reproducible: Always


User Profile Reset: No

Additional Info:
Calibri font has a feature to place the marks horizontally if U+200B is in between them. This behavior can be seen in Google Chrome and Notepad (Windows application), but it is not shown properly in LibreWriter.
Comment 1 Lateef Shaikh 2023-10-02 14:03:39 UTC
Created attachment 189957 [details]
Image showing bug
Comment 2 Lateef Shaikh 2023-10-02 14:08:24 UTC
Created attachment 189958 [details]
Character dialog box

Character dialog box shows the marks positioned properly.
Comment 3 Stéphane Guillou (stragu) 2023-10-16 15:06:22 UTC
Created attachment 190241 [details]
sample ODT

Thank you for the report, Lateef.
I tried creating a sample document.
If it doesn't match your description, could you please upload a new one and make mine obsolete?
Comment 4 Stéphane Guillou (stragu) 2023-10-16 15:08:53 UTC
Khaled, I thought you might find this interesting.

Note I have used the following font features to match Lateef's: Calibri:calt&dlig&liga (Not that it changed anything to the display.)

Tested in:

Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: e9374f74385d7dfe77d1902d3d82af20143bc775
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded
Comment 5 ⁨خالد حسني⁩ 2023-10-17 13:37:39 UTC
This is because U+200B is one of the formatting characters that Writer makes visible when View -> Field Shadings is checked, as the way this is implemented breaks text layout around such characters. It works in the font preview because it is rendered using a different code path than Writer’s.

But using U+200B here is a font-specific hack, the proper, Unicode-sanctioned way is to add a tatweel before the small alef, which should work with most fonts:

ٱلرَّحۡمَـٰن
Comment 6 Lateef Shaikh 2023-10-17 20:33:05 UTC
Thank you for your time and help with this. Actually I am trying different characters as alternate to the tatweel approach, because tatweel is more of a justification character and can only be used when we have this scenario between two connecting alphabets. When it comes to non connecting alphabets like Ra and alef then we can't use tatweel. Therefore I was looking for a space like character with zero width and allows joining.

U+200B breaks the word so after posting this bug I had a discussion with a friend who made me realize that it is not a good choice anyways.

Then I first found U+FEFF (ZWNBSP) but that is deprecated, the new character is Word Joiner (U+2060). It works fine in Chrome and in Libre (using Noto Naskh font).

So I think you can close this ticket.
Comment 7 Lateef Shaikh 2023-10-17 20:33:29 UTC
Created attachment 190270 [details]
U2060
Comment 8 ⁨خالد حسني⁩ 2023-10-17 21:19:15 UTC
(In reply to Lateef Shaikh from comment #6)
> Thank you for your time and help with this. Actually I am trying different
> characters as alternate to the tatweel approach, because tatweel is more of
> a justification character and can only be used when we have this scenario
> between two connecting alphabets. When it comes to non connecting alphabets
> like Ra and alef then we can't use tatweel. Therefore I was looking for a
> space like character with zero width and allows joining.

FWIW, Unicode suggests using tatweel between connected letters, and NBSP between unconnected ones. I can’t find where this is documented right now so you have to  take my word for it. This has the advantage of being a simple solution and works virtually with any font (and if it does not work properly, the fallback will still be acceptable).

The behavior reported here is still a bug regardless.