Created attachment 132129 [details] test text data Tifinagh rendering problems. The enclosed font and text render happily in firefox, and shape correctly with hb-shape (producing appropriate looking ligatures of the diacritic over the base) for either OpenType or Graphite. But the rendering is wrong (diacritics to the right of the base, with no ligatures) in libo 5.3. Not sure why
Created attachment 132130 [details] Tifinagh test font
Seems like a bug in the layer above VCL; we are asked to layout the bases and the combining marks separately. If I added something like this early in CommonSalLayout::LayoutText(): SAL_DEBUG(rArgs.mrStr << " => " << rArgs.mrStr.copy(rArgs.mnMinCharPos, rArgs.mnEndCharPos - rArgs.mnMinCharPos)); I get this for the Tifinagh text: debug:27777:1: ⴰ̂ => ⴰ debug:27777:1: ⴰ̂ => ̂ debug:27777:1: ⴰ̆ => ⴰ debug:27777:1: ⴰ̆ => ̆ debug:27777:1: ⵓ̂ => ⵓ debug:27777:1: ⵓ̂ => ̂ debug:27777:1: ⵢ̂ => ⵢ debug:27777:1: ⵢ̂ => ̂ debug:27777:1: ⵢ̣ => ⵢ debug:27777:1: ⵢ̣ => ̣ debug:27777:1: ⵧ̂ => ⵧ debug:27777:1: ⵧ̂ => ̂ I’m guessing that the code that itemizes text into simple/complex/asian scripts does not handle combining marks correctly and considers them a separate script from Tifinagh.
After a bit of debugging I think I found the root of this. Basically characters from Combining Diacritical Marks block are classified as ScriptType::LATIN when they should have been ScriptType::WEAK. This comes from i18npool/source/breakiterator/breakiteratorImpl.cxx: BreakIteratorImpl::getScriptClass() which calls getCompatibilityScriptClassByBlock() which in turn checks the scriptList array that assigns all blocks from UBLOCK_BASIC_LATIN to UBLOCK_ARMENIAN as ScriptType::LATIN and this includes UBLOCK_COMBINING_DIACRITICAL_MARKS. This comes from https://gerrit.libreoffice.org/gitweb?p=core.git;a=commitdiff;h=bf355b97ee4b53e38975f0e1847eda5b3e05f920 to fix bug 38095. I don’t know what old classification was and whether it indeed classified combining marks as Latin, but either way it does not make terrible since. If I modify scriptList to skip UBLOCK_COMBINING_DIACRITICAL_MARKS, then I get correct rendering here.
Actually this have been broken for as long as this code have been around, https://gerrit.libreoffice.org/gitweb?p=core.git;a=commitdiff;h=c5cd37730e145b86129c6503ae5661820ae042e7 (look for UnicodeScript_kCombiningDiacritical). I say we should just fix it.
https://gerrit.libreoffice.org/#/c/35811/
Yes, definitely. Perhaps as we do that, we could use uscript_getScript() and get a more accurate categorisation. It'll mean a new list of script values, but that's not hard. I would be tempted to do a simple array of script code to category, which would be quicker and only 174 bytes. It would start out something like: #define W ScriptType::WEAK #define C ScriptType::COMPLEX #define A ScriptType::ASIAN #define L ScriptType::LATIN {W, W, C, L, W, A, L, L, L, C, C, C, L, L, L, C, C, A, A, C, A, C, A, C, C, L, C, C, C, L, L, C, C, C, C, C, C, C, C, C, L, A, C, C, C, C, L, C, C, C, C, C, C, C, A, C, C, C, C, C, C, C, C, C, C, C, C, C, L, C, C, C, C, A, A, C, C, C, C, C, L, L, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, W, W, C, A, C, C, C, C, C, C, C, C, C, C, C, C, C, A, C, C, C, C, C, C, C, C, C, L, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, A, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, C, A, A, L} But this needs some review, especially later on in the list.
(In reply to martin_hosken from comment #6) > Yes, definitely. Perhaps as we do that, we could use uscript_getScript() and > get a more accurate categorisation. This was done before and lead to bug bug 38095, and was then fixed by first doing the old, by block, categorization that OpenOffice/LibreOffice did before and falling back uscript_getScript() for blocks not covered by the old method. But may be we should go again the full uscript_getScript() way, and special case a few ranges of characters that are causing issues.
Khaled Hosny committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=55ddbfc610d2a00e565ca7bcb0277da33bb90947 tdf#106755: Fix script type for combining marks It will be available in 5.4.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Khaled Hosny committed a patch related to this issue. It has been pushed to "libreoffice-5-3": http://cgit.freedesktop.org/libreoffice/core/commit/?id=1c5d8ea51f9da89d76b2b2d3bb896225a6ac9dca&h=libreoffice-5-3 tdf#106755: Fix script type for combining marks It will be available in 5.3.3. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.