Bug 106755 - tifinagh rendering problems
Summary: tifinagh rendering problems
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: ⁨خالد حسني⁩
URL:
Whiteboard: target:5.4.0 target:5.3.3
Keywords:
Depends on:
Blocks: Font-Rendering
  Show dependency treegraph
 
Reported: 2017-03-24 16:58 UTC by martin_hosken
Modified: 2017-03-29 08:11 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
test text data (37 bytes, text/plain)
2017-03-24 16:58 UTC, martin_hosken
Details
Tifinagh test font (118.96 KB, application/x-font-ttf)
2017-03-24 17:00 UTC, martin_hosken
Details

Note You need to log in before you can comment on or make changes to this bug.
Description martin_hosken 2017-03-24 16:58:47 UTC
Created attachment 132129 [details]
test text data

Tifinagh rendering problems. The enclosed font and text render happily in firefox, and shape correctly with hb-shape (producing appropriate looking ligatures of the diacritic over the base) for either OpenType or Graphite. But the rendering is wrong (diacritics to the right of the base, with no ligatures) in libo 5.3. Not sure why
Comment 1 martin_hosken 2017-03-24 17:00:28 UTC
Created attachment 132130 [details]
Tifinagh test font
Comment 2 ⁨خالد حسني⁩ 2017-03-28 15:55:45 UTC
Seems like a bug in the layer above VCL; we are asked to layout the bases and the combining marks separately. If I added something like this early in CommonSalLayout::LayoutText():

SAL_DEBUG(rArgs.mrStr << " => " << rArgs.mrStr.copy(rArgs.mnMinCharPos, rArgs.mnEndCharPos - rArgs.mnMinCharPos));

I get this for the Tifinagh text:

debug:27777:1: ⴰ̂ => ⴰ
debug:27777:1: ⴰ̂ => ̂
debug:27777:1: ⴰ̆ => ⴰ
debug:27777:1: ⴰ̆ => ̆
debug:27777:1: ⵓ̂ => ⵓ
debug:27777:1: ⵓ̂ => ̂
debug:27777:1: ⵢ̂ => ⵢ
debug:27777:1: ⵢ̂ => ̂
debug:27777:1: ⵢ̣ => ⵢ
debug:27777:1: ⵢ̣ => ̣
debug:27777:1: ⵧ̂ => ⵧ
debug:27777:1: ⵧ̂ => ̂

I’m guessing that the code that itemizes text into simple/complex/asian scripts does not handle combining marks correctly and considers them a separate script from Tifinagh.
Comment 3 ⁨خالد حسني⁩ 2017-03-28 17:43:07 UTC
After a bit of debugging I think I found the root of this. Basically characters from Combining Diacritical Marks block are classified as ScriptType::LATIN when they should have been ScriptType::WEAK. This comes from i18npool/source/breakiterator/breakiteratorImpl.cxx:

BreakIteratorImpl::getScriptClass() which calls getCompatibilityScriptClassByBlock() which in turn checks the scriptList array that assigns all blocks from UBLOCK_BASIC_LATIN to UBLOCK_ARMENIAN as ScriptType::LATIN and this includes UBLOCK_COMBINING_DIACRITICAL_MARKS.

This comes from https://gerrit.libreoffice.org/gitweb?p=core.git;a=commitdiff;h=bf355b97ee4b53e38975f0e1847eda5b3e05f920 to fix bug 38095. I don’t know what old classification was and whether it indeed classified combining marks as Latin, but either way it does not make terrible since.

If I modify scriptList to skip UBLOCK_COMBINING_DIACRITICAL_MARKS, then I get correct rendering here.
Comment 4 ⁨خالد حسني⁩ 2017-03-28 17:55:43 UTC
Actually this have been broken for as long as this code have been around, https://gerrit.libreoffice.org/gitweb?p=core.git;a=commitdiff;h=c5cd37730e145b86129c6503ae5661820ae042e7 (look for UnicodeScript_kCombiningDiacritical). I say we should just fix it.
Comment 5 ⁨خالد حسني⁩ 2017-03-28 18:41:50 UTC
https://gerrit.libreoffice.org/#/c/35811/
Comment 6 martin_hosken 2017-03-28 19:04:26 UTC
Yes, definitely. Perhaps as we do that, we could use uscript_getScript() and get a more accurate categorisation. It'll mean a new list of script values, but that's not hard. I would be tempted to do a simple array of script code to category, which would be quicker and only 174 bytes. It would start out something like:
#define W ScriptType::WEAK
#define C ScriptType::COMPLEX
#define A ScriptType::ASIAN
#define L ScriptType::LATIN
{W, W, C, L, W, A, L, L,
 L, C, C, C, L, L, L, C,
 C, A, A, C, A, C, A, C,
 C, L, C, C, C, L, L, C,
 C, C, C, C, C, C, C, C,
 L, A, C, C, C, C, L, C,
 C, C, C, C, C, C, A, C,
 C, C, C, C, C, C, C, C,
 C, C, C, C, L, C, C, C,
 C, A, A, C, C, C, C, C,
 L, L, C, C, C, C, C, C,
 C, C, C, C, C, C, C, C,
 C, C, C, C, C, C, W, W,
 C, A, C, C, C, C, C, C,
 C, C, C, C, C, C, C, A,
 C, C, C, C, C, C, C, C,
 C, L, C, C, C, C, C, C,
 C, C, C, C, C, C, C, C,
 C, C, C, C, C, C, C, C,
 C, C, A, C, C, C, C, C,
 C, C, C, C, C, C, C, C,
 C, C, C, C, A, A, L}

But this needs some review, especially later on in the list.
Comment 7 ⁨خالد حسني⁩ 2017-03-28 19:37:34 UTC
(In reply to martin_hosken from comment #6)
> Yes, definitely. Perhaps as we do that, we could use uscript_getScript() and
> get a more accurate categorisation.

This was done before and lead to bug bug 38095, and was then fixed by first doing the old, by block, categorization that OpenOffice/LibreOffice did before and falling back uscript_getScript() for blocks not covered by the old method.

But may be we should go again the full uscript_getScript() way, and special case a few ranges of characters that are causing issues.
Comment 8 Commit Notification 2017-03-28 20:04:10 UTC
Khaled Hosny committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=55ddbfc610d2a00e565ca7bcb0277da33bb90947

tdf#106755: Fix script type for combining marks

It will be available in 5.4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Commit Notification 2017-03-29 08:11:30 UTC
Khaled Hosny committed a patch related to this issue.
It has been pushed to "libreoffice-5-3":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1c5d8ea51f9da89d76b2b2d3bb896225a6ac9dca&h=libreoffice-5-3

tdf#106755: Fix script type for combining marks

It will be available in 5.3.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.