Description: I use a proprietary font (thus, I cannot upload it, unfortunately), whose Opentype features react to the "Zero Width Non Joiner" (U+200C) by preventing the ligature AND (in some cases) choosing a different character alternative. If I input the "Zero Width Non Joiner" via Insert / Special character, all works fine. However, if I input it from the keyboard using a custom layout, ligature prevention works, but the wrong character alternative is chosen. Actual Results: "Zero Width Non Joiner" ignored for choice of character alternative if input via the keyboard. Expected Results: "Zero Width Non Joiner" should also affect the character alternatives if the font demands it. Also, when input via the keyboard. Reproducible: Always User Profile Reset: No Additional Info: User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.2.2 Waterfox/55.2.2
(In reply to Patrick Schönbach from comment #0) > Description: > I use a proprietary font (thus, I cannot upload it, unfortunately), whose > Opentype features react to the "Zero Width Non Joiner" (U+200C) by > preventing the ligature AND (in some cases) choosing a different character > alternative. Please try to find a free font that exhibits the same behaviour. Set to NEEDINFO. Change back to UNCONFIRMED after you have found the font.
This font can be used for testing: http://www.fraktur.biz/MarsFrakturOT-Normal.zip
Created attachment 139930 [details] Test case demonstrating the problem
How do you set up the custom keyboard layout for it in Windows?(In reply to Patrick Schönbach from comment #3) > Created attachment 139930 [details] > Test case demonstrating the problem Thanks, I could reproduce by copying the character to clipboard from somewhere else - this can be used: https://unicode.flopp.net/c/200C I used this before remembering those unicode exploration websites: https://userbase.kde.org/KCharSelect I went into the first Wachstube, before the "t", hit backspace twice, pasted the U+200C character and typed "s". Testing with Insert - Special character, the "long s" was prevented correctly. Arch Linux 64-bit Version: 6.1.0.0.alpha0+ Build ID: 09cb65bb92318bf8edb467fcd7720f072306f379 CPU threads: 8; OS: Linux 4.15; UI render: default; VCL: kde4; Locale: fi-FI (fi_FI.UTF-8); Calc: group Built on February 15th 2018
The font does not have any OpenType features that interact with U+200C at all. What is happening here is that inserting any character via Special Character dialog puts the character in text span of its own with different text style, which will cause LibreOffice to break the text when shaping i.e. the text is processed as if it were the two words “Wachs” and “tube”, see bug 113134. This has the side effect of causing the behavior you are expecting to happen, but it is in’t intentional, rather a bad interaction of bug 113134 and LibreOffice’s own limitation for shaping words across inline styles separately. As such the font is actually working as intended in the 1st line, while the second line is a bug.
But the scripting of the font I linked here, evaluates U+200C to choose the correct "s" shape.
Why is it not a bug? The U+200C is evaluated correctly in other OpenType aware word processors, but not in LO.
(In reply to Patrick Schönbach from comment #7) > Why is it not a bug? The U+200C is evaluated correctly in other OpenType > aware word processors, but not in LO. Khaled said "The font does not have any OpenType features that interact with U+200C at all" referring to MarsFrakturOT. If you have evidence to the contrary, please provide it. I opened the font in FontForge and tried to find out what it has, but unfortunately I am not a font guru. In FontForge, I did Element - Font info. Then I went into Lookups. There I saw Single substitution lookup 3 subtable transformed s into long s. 'calt' contextual alternates in latin lookup 2 subtable then had a lot of rules related to this transformation.
I talked to the font designer: His font relies on the assertion that the glyph before U+200C is treated as a word end. This works in all other OpenType aware text processors, but not in LO.
(In reply to Patrick Schönbach from comment #9) > I talked to the font designer: His font relies on the assertion that the > glyph before U+200C is treated as a word end. This works in all other > OpenType aware text processors, but not in LO. I guess if we change the LibreOffice behaviour to treat the glyph before U+200C it would break other fonts which interact with U+200C Putting back to RESOLVED NOTABUG as Khaled ( the font expert ) did.
(In reply to Patrick Schönbach from comment #9) > I talked to the font designer: His font relies on the assertion that the > glyph before U+200C is treated as a word end. This works in all other > OpenType aware text processors, but not in LO. That is not a valid assumption, and you can check other applications like Chrome or Firefox. U+200C is just like any other character and though it has some complex and script-specific properties, none of them specify that it constitutes a word break AFAIK. The font can easily add explicit rules for handling U+200C.
Why does it worrk in other word processors with OpenType support, then?
(In reply to Patrick Schönbach from comment #12) > Why does it worrk in other word processors with OpenType support, then? Please give an example of a word processor where it works. MSO 2016? I tried with MSO 2013, but unfortunately its OT support seems to be lacking: the long s did not appear at all even though I enabled "All" of the ligatures in the advanced font settings for the text.
Created attachment 140036 [details] Sample Word document With MSO 2016 it works perfectly fine, if you enable context alternatives.
Why is there a "Zero Width Non Joiner" and a "Zero Width Joiner" if the "Zero Width Non Joiner" would have no inherent non joining functionality and the non joining would have to be implemented by every OT font on its own? This doesn't make sense.
(In reply to Patrick Schönbach from comment #15) > Why is there a "Zero Width Non Joiner" and a "Zero Width Joiner" if the > "Zero Width Non Joiner" would have no inherent non joining functionality and > the non joining would have to be implemented by every OT font on its own? > This doesn't make sense. Please consult the relevant part The Unicode Standard, namely §32.2 Cursive Connection and Ligatures; http://www.unicode.org/versions/Unicode10.0.0/ch23.pdf#G23126), and UAX#29 Unicode Text Segmentation; http://unicode.org/reports/tr29/#Word_Boundary_Rules.
In fact, doing s → ſ in font is discouraged, since long ſ rules vary between languages or even time periods and that is why a dedicated character is encoded in Unicode; http://unicode.org/faq/vs.html#12.
(In reply to Khaled Hosny from comment #17) > In fact, doing s → ſ in font is discouraged, since long ſ rules vary between > languages or even time periods and that is why a dedicated character is > encoded in Unicode; http://unicode.org/faq/vs.html#12. Actually, the font has a dedicated long s. It only has rules that automatically choose the correct s glyph in the majority of cases. U+200C is used to influence the choicw in cases where the automatic choice failed.
"The ZWJ and ZWNJ are designed for marking the unusual cases where ligatures or cursive connections are required or prohibited. These characters are not to be used in all cases where ligatures or cursive connections are desired; instead, they are meant only for overriding the normal behavior of the text." This is exactly how the font uses ZWNJ.
(In reply to Patrick Schönbach from comment #19) > "The ZWJ and ZWNJ are designed for marking the unusual cases where ligatures > or cursive connections are required or prohibited. These characters are not > to be used in all cases where ligatures or cursive connections are desired; > instead, they are meant only for overriding the normal behavior of the text." > > This is exactly how the font uses ZWNJ. Long s is neither a ligature nor a cursive connection.
(In reply to Khaled Hosny from comment #20) > (In reply to Patrick Schönbach from comment #19) > > "The ZWJ and ZWNJ are designed for marking the unusual cases where ligatures > > or cursive connections are required or prohibited. These characters are not > > to be used in all cases where ligatures or cursive connections are desired; > > instead, they are meant only for overriding the normal behavior of the text." > > > > This is exactly how the font uses ZWNJ. > > Long s is neither a ligature nor a cursive connection. It uses ZWNJ for both ligature spliting (e.g. "st"), but also for enforcing normal s.
(In reply to Patrick Schönbach from comment #21) > (In reply to Khaled Hosny from comment #20) > > Long s is neither a ligature nor a cursive connection. > > It uses ZWNJ for both ligature spliting (e.g. "st"), but also for enforcing > normal s. Yep, so from the perspective of the Unicode standard, it is a hack to use it for s.
(In reply to Buovjaga from comment #22) > Yep, so from the perspective of the Unicode standard, it is a hack to use it > for s. I am not so sure. MSO, AbiWord, InDesign and others do support it.
And, at least in German, it makes perfect sense, since you split ligatures if there is a compound word boundary.
Axtually,ZWNJ os supposted to end a so called "grapheme cluster". See http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries This is what MSO and others do, but LO doesn't.
You can argue for this all day, but I’m not going to reply to every baseless claim you make. ZWNJ does not do what you think it does, this font is broken and can be fixed in like 10 minutes. Instead of arguing here for days, tell the font vendor to fix the font. This is the last comment I’m making in this issue.
https://en.wikipedia.org/wiki/Zero-width_non-joiner "When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to be printed in their final and initial forms, respectively." The long s is NEVER a final form! You might say that this is a special case since it does not always involve a ligature, but if e.g. you had to split an "st" ligature, yoz'd have to put an "s" anyway, not a "long s"! This is what the spec says!
And this is exactly how all other word processors I know implement it, which is in perfect accordance with the Unicode specs... <final form> <zwnj> <initial form>
If you open the attached documents in Word and then in Writer, they both render differently. This means, one party interprets the specs wrong, and I cannot see where MSO violates the specs. They do exactly what the specs demand.
https://codepoints.net/U+200C "In text U+200C behaves as Combining Mark regarding line breaks. It has type Extend for sentence and Extend for word breaks. The Grapheme Cluster Break is Extend."
The bug title is much better now, but atill not correct. It affects ANY font, and it happens if the glyph before ZWNJ has a different final forn. One would also have to check, if the glyph after ZWNJ is renedered correctly as initial form. I cannot check this with Gernab fonts, since this case diesn't happen there,
I still wonder why you keep ignoring clear evidence. The spec clearly states thst the glyph before ZWNJ has to be rendered as *final form*, and the glyph after ZWNJ has to be rendered as *initial form*. We have clear evidence (at least concernig the final form) that this does not happen in LO. We also have clear evidence that other word processors do work this way, and, first and foremost, the specs clearly call for this behavior (I quoted different sources). The "Wachs|tube" example is a perfect demonstration, by the way. It is *not enough* here to split the "st" ligature here. Also, the "s" has to be a *final form*, because otherwise, the result is *not typographically correct*! It is not the responsibility of individual fonts to implement this behavior, but the word processor has to get the ZWNJ behavior right, as it is stated in the specs. Therefore, this is a bug.
More evidence: http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=silhebrunic2 Try these ones: "When U+0592 HEBREW ACCENT SEGOL and U+05A9 HEBREW ACCENT TELISHA QETANA occur word-medial, they are centred unless followed by U+200C ZERO WIDTH NON-JOINER, in which case they move to the left. When U+05A0 HEBREW ACCENT TELISHA GEDOLA is word-medial, it is centred unless the consonant is preceded by U+200C ZERO WIDTH NON-JOINER, in which case it moves to the right."
http://www.unicode.org/versions/Unicode10.0.0/UnicodeStandard-10.0.pdf p. 847 Non-joiner. U+200C zero width non-joiner is intended to break *both cursive connections and ligatures* in rendering. This means: -Final glyph form before ZWNJ. - Initial glyph form after ZWNJ. This is the responsibility of the rendering engine, not the font! "long s" definitely counts as cursive connection as it only can appear in initial position and intermediary position, but never* in final position. So, the rendering engine has to display the correct character alterbatives when dealing with a ZWNJ.
So, even the Unicode standard as quoted in comment #34 is simply ignored?