Bug Hunting Session
Bug 115050 - "Zero Width Non Joiner" does not select alternate glyph in a certain font
Summary: "Zero Width Non Joiner" does not select alternate glyph in a certain font
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.0.0.2 rc
Hardware: All All
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Font-Rendering
  Show dependency treegraph
 
Reported: 2018-01-16 18:56 UTC by Patrick Schönbach
Modified: 2018-03-02 16:49 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Test case demonstrating the problem (9.47 KB, application/vnd.oasis.opendocument.text)
2018-02-15 17:24 UTC, Patrick Schönbach
Details
Sample Word document (11.59 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-02-21 11:10 UTC, Patrick Schönbach
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick Schönbach 2018-01-16 18:56:46 UTC
Description:
I use a proprietary font (thus, I cannot upload it, unfortunately), whose Opentype features react to the "Zero Width Non Joiner" (U+200C) by preventing the ligature AND (in some cases) choosing a different character alternative.

If I input the "Zero Width Non Joiner" via Insert / Special character, all works fine. However, if I input it from the keyboard using a custom layout, ligature prevention works, but the wrong character alternative is chosen.

Actual Results:  
"Zero Width Non Joiner" ignored for choice of character alternative if input via the keyboard.

Expected Results:
"Zero Width Non Joiner" should also affect the character alternatives if the font demands it. Also, when input via the keyboard.


Reproducible: Always


User Profile Reset: No



Additional Info:


User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.2.2 Waterfox/55.2.2
Comment 1 Buovjaga 2018-02-11 14:30:25 UTC
(In reply to Patrick Schönbach from comment #0)
> Description:
> I use a proprietary font (thus, I cannot upload it, unfortunately), whose
> Opentype features react to the "Zero Width Non Joiner" (U+200C) by
> preventing the ligature AND (in some cases) choosing a different character
> alternative.

Please try to find a free font that exhibits the same behaviour.

Set to NEEDINFO.
Change back to UNCONFIRMED after you have found the font.
Comment 2 Patrick Schönbach 2018-02-15 17:22:40 UTC
This font can be used for testing: http://www.fraktur.biz/MarsFrakturOT-Normal.zip
Comment 3 Patrick Schönbach 2018-02-15 17:24:34 UTC
Created attachment 139930 [details]
Test case demonstrating the problem
Comment 4 Buovjaga 2018-02-15 19:16:37 UTC
How do you set up the custom keyboard layout for it in Windows?(In reply to Patrick Schönbach from comment #3)
> Created attachment 139930 [details]
> Test case demonstrating the problem

Thanks, I could reproduce by copying the character to clipboard from somewhere else - this can be used: https://unicode.flopp.net/c/200C
I used this before remembering those unicode exploration websites: https://userbase.kde.org/KCharSelect

I went into the first Wachs‌tube, before the "t", hit backspace twice, pasted the U+200C character and typed "s".
Testing with Insert - Special character, the "long s" was prevented correctly.

Arch Linux 64-bit
Version: 6.1.0.0.alpha0+
Build ID: 09cb65bb92318bf8edb467fcd7720f072306f379
CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on February 15th 2018
Comment 5 Khaled Hosny 2018-02-19 02:03:56 UTC
The font does not have any OpenType features that interact with U+200C at all. 

What is happening here is that inserting any character via Special Character dialog puts the character in text span of its own with different text style, which will cause LibreOffice to break the text when shaping i.e. the text is processed as if it were the two words “Wachs” and “tube”, see bug 113134. This has the side effect of causing the behavior you are expecting to happen, but it is in’t intentional, rather a bad interaction of bug 113134 and LibreOffice’s own limitation for shaping words across inline styles separately.

As such the font is actually working as intended in the 1st line, while the second line is a bug.
Comment 6 Patrick Schönbach 2018-02-19 11:03:14 UTC
But the scripting of the font I linked here, evaluates U+200C to choose the correct "s" shape.
Comment 7 Patrick Schönbach 2018-02-19 12:15:24 UTC
Why is it not a bug? The U+200C is evaluated correctly in other OpenType aware word processors, but not in LO.
Comment 8 Buovjaga 2018-02-19 12:47:04 UTC
(In reply to Patrick Schönbach from comment #7)
> Why is it not a bug? The U+200C is evaluated correctly in other OpenType
> aware word processors, but not in LO.

Khaled said "The font does not have any OpenType features that interact with U+200C at all" referring to MarsFrakturOT. If you have evidence to the contrary, please provide it. I opened the font in FontForge and tried to find out what it has, but unfortunately I am not a font guru.

In FontForge, I did Element - Font info. Then I went into Lookups.
There I saw Single substitution lookup 3 subtable transformed s into long s.
'calt' contextual alternates in latin lookup 2 subtable then had a lot of rules related to this transformation.
Comment 9 Patrick Schönbach 2018-02-19 18:39:26 UTC
I talked to the font designer: His font relies on the assertion that the glyph before U+200C is treated as a word end. This works in all other OpenType aware text processors, but not in LO.
Comment 10 Xisco Faulí 2018-02-20 10:23:15 UTC
(In reply to Patrick Schönbach from comment #9)
> I talked to the font designer: His font relies on the assertion that the
> glyph before U+200C is treated as a word end. This works in all other
> OpenType aware text processors, but not in LO.

I guess if we change the LibreOffice behaviour to treat the glyph before U+200C it would break other fonts which interact with U+200C
Putting back to RESOLVED NOTABUG as Khaled ( the font expert ) did.
Comment 11 Khaled Hosny 2018-02-20 20:14:26 UTC
(In reply to Patrick Schönbach from comment #9)
> I talked to the font designer: His font relies on the assertion that the
> glyph before U+200C is treated as a word end. This works in all other
> OpenType aware text processors, but not in LO.

That is not a valid assumption, and you can check other applications like Chrome or Firefox. U+200C is just like any other character and though it has some complex and script-specific properties, none of them specify that it constitutes a word break AFAIK. The font can easily add explicit rules for handling U+200C.
Comment 12 Patrick Schönbach 2018-02-20 21:10:01 UTC
Why does it worrk in other word processors with OpenType support, then?
Comment 13 Buovjaga 2018-02-21 08:36:29 UTC
(In reply to Patrick Schönbach from comment #12)
> Why does it worrk in other word processors with OpenType support, then?

Please give an example of a word processor where it works. MSO 2016? I tried with MSO 2013, but unfortunately its OT support seems to be lacking: the long s did not appear at all even though I enabled "All" of the ligatures in the advanced font settings for the text.
Comment 14 Patrick Schönbach 2018-02-21 11:10:56 UTC
Created attachment 140036 [details]
Sample Word document

With MSO 2016 it works perfectly fine, if you enable context alternatives.
Comment 15 Patrick Schönbach 2018-02-21 11:15:56 UTC
Why is there a "Zero Width Non Joiner" and a "Zero Width Joiner" if the "Zero Width Non Joiner" would have no inherent non joining functionality and the non joining would have to be implemented by every OT font on its own? This doesn't make sense.
Comment 16 Khaled Hosny 2018-02-21 15:59:49 UTC
(In reply to Patrick Schönbach from comment #15)
> Why is there a "Zero Width Non Joiner" and a "Zero Width Joiner" if the
> "Zero Width Non Joiner" would have no inherent non joining functionality and
> the non joining would have to be implemented by every OT font on its own?
> This doesn't make sense.

Please consult the relevant part The Unicode Standard, namely §32.2 Cursive Connection and Ligatures; http://www.unicode.org/versions/Unicode10.0.0/ch23.pdf#G23126), and UAX#29 Unicode Text Segmentation; http://unicode.org/reports/tr29/#Word_Boundary_Rules.
Comment 17 Khaled Hosny 2018-02-21 16:15:07 UTC
In fact, doing s → ſ in font is discouraged, since long ſ rules vary between languages or even time periods and that is why a dedicated character is encoded in Unicode; http://unicode.org/faq/vs.html#12.
Comment 18 Patrick Schönbach 2018-02-21 16:30:46 UTC
(In reply to Khaled Hosny from comment #17)
> In fact, doing s → ſ in font is discouraged, since long ſ rules vary between
> languages or even time periods and that is why a dedicated character is
> encoded in Unicode; http://unicode.org/faq/vs.html#12.

Actually, the font has a dedicated long s. It only has rules that automatically choose the correct s glyph in the majority of cases. U+200C is used to influence the choicw in cases where the automatic choice failed.
Comment 19 Patrick Schönbach 2018-02-21 16:34:45 UTC
"The ZWJ and ZWNJ are designed for marking the unusual cases where ligatures or cursive connections are required or prohibited. These characters are not to be used in all cases where ligatures or cursive connections are desired; instead, they are meant only for overriding the normal behavior of the text."

This is exactly how the font uses ZWNJ.
Comment 20 Khaled Hosny 2018-02-21 16:56:50 UTC
(In reply to Patrick Schönbach from comment #19)
> "The ZWJ and ZWNJ are designed for marking the unusual cases where ligatures
> or cursive connections are required or prohibited. These characters are not
> to be used in all cases where ligatures or cursive connections are desired;
> instead, they are meant only for overriding the normal behavior of the text."
> 
> This is exactly how the font uses ZWNJ.

Long s is neither a ligature nor a cursive connection.
Comment 21 Patrick Schönbach 2018-02-21 17:08:07 UTC
(In reply to Khaled Hosny from comment #20)
> (In reply to Patrick Schönbach from comment #19)
> > "The ZWJ and ZWNJ are designed for marking the unusual cases where ligatures
> > or cursive connections are required or prohibited. These characters are not
> > to be used in all cases where ligatures or cursive connections are desired;
> > instead, they are meant only for overriding the normal behavior of the text."
> > 
> > This is exactly how the font uses ZWNJ.
> 
> Long s is neither a ligature nor a cursive connection.

It uses ZWNJ for  both ligature spliting (e.g. "st"), but also for enforcing normal s.
Comment 22 Buovjaga 2018-02-21 17:39:43 UTC
(In reply to Patrick Schönbach from comment #21)
> (In reply to Khaled Hosny from comment #20)
> > Long s is neither a ligature nor a cursive connection.
> 
> It uses ZWNJ for  both ligature spliting (e.g. "st"), but also for enforcing
> normal s.

Yep, so from the perspective of the Unicode standard, it is a hack to use it for s.
Comment 23 Patrick Schönbach 2018-02-21 17:57:27 UTC
(In reply to Buovjaga from comment #22)
> Yep, so from the perspective of the Unicode standard, it is a hack to use it
> for s.

I am not so sure. MSO, AbiWord, InDesign and others do support it.
Comment 24 Patrick Schönbach 2018-02-21 18:05:02 UTC
And, at least in German, it makes perfect sense, since you split ligatures if there is a compound word boundary.
Comment 25 Patrick Schönbach 2018-02-21 20:50:49 UTC
Axtually,ZWNJ os supposted to end a so called "grapheme cluster". See http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

This is what MSO and others do, but LO doesn't.
Comment 26 Khaled Hosny 2018-02-21 23:31:47 UTC
You can argue for this all day, but I’m not going to reply to every baseless claim you make. ZWNJ does not do what you think it does, this font is broken and can be fixed in like 10 minutes. Instead of arguing here for days, tell the font vendor to fix the font. This is the last comment I’m making in this issue.
Comment 27 Patrick Schönbach 2018-02-21 23:54:25 UTC
https://en.wikipedia.org/wiki/Zero-width_non-joiner

"When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to be printed in their final and initial forms, respectively."

The long s is NEVER a final form! You might say that this is a special case since it does not always involve a ligature, but if e.g. you had to split an "st" ligature, yoz'd have to put an "s" anyway, not a "long s"! This is what the spec says!
Comment 28 Patrick Schönbach 2018-02-22 00:14:20 UTC
And this is exactly how all other word processors I know implement it, which is in perfect accordance with the Unicode specs...

<final form> <zwnj> <initial form>
Comment 29 Patrick Schönbach 2018-02-22 00:43:57 UTC
If you open the attached documents in Word and then in Writer, they both render differently.  This means, one party interprets the specs wrong, and I cannot see where MSO violates the specs. They do exactly what the specs demand.
Comment 30 Patrick Schönbach 2018-02-22 10:51:26 UTC
https://codepoints.net/U+200C

"In text U+200C behaves as Combining Mark regarding line breaks. It has type Extend for sentence and Extend for word breaks. The Grapheme Cluster Break is Extend."
Comment 31 Patrick Schönbach 2018-02-22 12:09:01 UTC
The bug title is much better now, but atill not correct. It affects ANY font, and it happens if the glyph before ZWNJ has a different final forn. One would also have to check, if the glyph after ZWNJ is renedered correctly as initial form. I cannot check this with Gernab fonts, since this case diesn't happen there,
Comment 32 Patrick Schönbach 2018-02-23 22:20:40 UTC
I still wonder why you keep ignoring clear evidence. The spec clearly states thst the glyph before ZWNJ has to be rendered as *final form*, and the glyph after ZWNJ has to be rendered as *initial form*.

We have clear evidence (at least concernig the final form) that this does not happen in LO. We also have clear evidence that other word processors do work this way, and, first and foremost, the specs clearly call for this behavior (I quoted different sources).

The "Wachs|tube" example is a perfect demonstration, by the way. It is *not enough* here to split the "st" ligature here. Also, the "s" has to be a *final form*, because otherwise, the result is *not typographically correct*! It is not the responsibility of individual fonts to implement this behavior, but the word processor has to get the ZWNJ behavior right, as it is stated in the specs.

Therefore, this is a bug.
Comment 33 Patrick Schönbach 2018-02-24 14:33:21 UTC
More evidence:

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=silhebrunic2

Try these ones:

"When U+0592 HEBREW ACCENT SEGOL and U+05A9 HEBREW ACCENT TELISHA QETANA occur word-medial, they are centred unless followed by U+200C ZERO WIDTH NON-JOINER, in which case they move to the left.

When U+05A0 HEBREW ACCENT TELISHA GEDOLA is word-medial, it is centred unless the consonant is preceded by U+200C ZERO WIDTH NON-JOINER, in which case it moves to the right."
Comment 34 Patrick Schönbach 2018-02-25 17:49:16 UTC
http://www.unicode.org/versions/Unicode10.0.0/UnicodeStandard-10.0.pdf p. 847

Non-joiner. U+200C zero width non-joiner is intended to break *both cursive connections and ligatures* in rendering.

This means:

-Final glyph form before ZWNJ.
- Initial glyph form after ZWNJ.

This is the responsibility of the rendering engine, not the font!

"long s" definitely counts as cursive connection as it only can appear in initial position and intermediary position, but never* in final position.

So, the rendering engine has to display the correct character alterbatives when dealing with a ZWNJ.
Comment 35 Patrick Schönbach 2018-03-02 16:49:09 UTC
So, even the Unicode standard as quoted in comment #34 is simply ignored?