Bug 52582 - Problem with combining unencoded characters in Brahmi Graphite font
Summary: Problem with combining unencoded characters in Brahmi Graphite font
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: graphics stack (show other bugs)
Version:
(earliest affected)
4.0.0.0.alpha0+ Master
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on: HarfBuzz
Blocks: Font-Rendering
  Show dependency treegraph
 
Reported: 2012-07-27 12:39 UTC by Shriramana Sharma
Modified: 2022-10-03 18:57 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Test material to reproduce and test the bug (259.28 KB, application/x-zip-compressed)
2012-07-27 12:39 UTC, Shriramana Sharma
Details
UTF-8 encoded text file containing relevant sample and output of current HB trunk (1.41 KB, application/gzip)
2013-10-13 08:52 UTC, Shriramana Sharma
Details
Results of testing on LibO 4.2 release (143.62 KB, application/zip)
2014-02-03 04:45 UTC, Shriramana Sharma
Details
PDF export using 4.4.0.0 alpha1 from Oct 26 (31.43 KB, application/pdf)
2014-10-26 13:34 UTC, Buovjaga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Shriramana Sharma 2012-07-27 12:39:14 UTC
Created attachment 64775 [details]
Test material to reproduce and test the bug

While working on a proposal to encode two characters to Brahmi (https://sites.google.com/site/jamadagni/files/utcsubmissions/12226-brahmi-two-tamil-characters-proposal.pdf), I found that some versions of LibO on some platforms had a bug by which an unencoded codepoint would not combine properly.

I had proposed to disunify the Tamil Brahmi virama from 11046 and encode it at 11070, and likewise to disunify the Tamil Brahmi LLA from 11034 and encode it at 11071.

However, if I map the glyphs in my fonts to 11070 and 11071, problems are created. See the attachment. It contains a Graphite Brahmi font (under the OFL) in glyph-only and Graphite-enabled forms. The GDL is also included. 

Test ODTs and the renderings (as PDF) on LibO 3.5.4 on Win XP, LibO 3.5.3 on Linux (Kubuntu Precise) and LibO 3.7.0alpha (LibO~master~2012-06-14_22.09.53_3.7.0alpha0) on Win XP are provided. 

Only LibO 3.5.4 on Win XP does not have any problems with the un-encoded mappings. LibO 3.5.3 on Linux and LibO 3.7alpha on Win XP do not correctly join LLA to the vowel signs. (LLA line highlighted in blue.) 

Perhaps some versions of LibO make some assumptions about the encoded or non-encoded status of characters? Which is why 11071 Tamil Brahmi LLA does not combine properly? But this does not explain why I had no problems with the equally un-encoded 11070 Tamil Brahmi Virama? Or how the version numbers (see above) work out (I mean, bug seen in 3.5.3 and 3.7 but not in 3.5.4)?

Graphite by definition does not make any assumptions about the encoding or non-encoding of any characters. If LibO is to provide true Graphite integration, it should also not make any assumptions about input characters when they are being rendered using Graphite.

Please fix this so that we can use LibO for rare Indic scripts via Graphite (which we can't expect OpenType support for).

Thank you for your great work on LibO and Graphite!
Comment 1 Joel Madero 2013-01-11 20:27:41 UTC
I really suspect that this isn't our bug and is something to do with how Linux is dealing with the font. 

Does the font work correctly in other software outside of LibO inside of Linux? Marking as NEEDINFO just for that information, once you provide it, please mark as NEW as it's clear that there is in fact a difference and if you can confirm that it's not Linux wide problem we can mark as NEW.

Very similar to a bug that I reported quite some time ago about Telugu font, I closed it myself thinking it was a problem with Linux not with LibO:

https://bugs.freedesktop.org/show_bug.cgi?id=48303

Thanks for your patience and help with getting this bug triaged
Comment 2 QA Administrators 2013-09-24 01:54:46 UTC Comment hidden (obsolete)
Comment 3 Shriramana Sharma 2013-10-13 08:52:51 UTC
Created attachment 87544 [details]
UTF-8 encoded text file containing relevant sample and output of current HB trunk

Sorry for the delay in replying. 

I have attached a UTF-8 encoded input file I fed to hb-view of HarfBuzz (NG, latest trunk) built with Graphite2 support on Kubuntu Precise 64 bit. I have also attached the output PNG. You will see that there are no problems in rendering. 

OTOH latest LibO 4.1.1.2 release under same Kubuntu still shows the problem.
Comment 4 Shriramana Sharma 2013-10-13 08:53:40 UTC
See previous comment. Sorry for the extra post but there was no way to reset the status and do the attachment at once.
Comment 5 Joel Madero 2013-10-13 17:17:08 UTC
Appropriate status is UNCONFIRMED as we need confirmation from an independent QA person :) Thanks for the attachment
Comment 6 Shriramana Sharma 2014-02-03 04:45:03 UTC
Created attachment 93256 [details]
Results of testing on LibO 4.2 release

I tested the material against the recent release of LibO 4.2. I confirm that the bug still exists. Please can some QA person also confirm this so this can be fixed soon? It persists since almost two years. Thank you!
Comment 7 Buovjaga 2014-10-26 13:34:21 UTC
Created attachment 108450 [details]
PDF export using 4.4.0.0 alpha1 from Oct 26

I confirm that the problem persists. The problematic highlighted row "lla" is not the same as in the HarfBuzz .png.

Win 7 64-bit dev build Version: 4.4.0.0.alpha1+
Build ID: fa58d91094895a530648630fa64b8724ea1e4305
TinderBox: Win-x86@39, Branch:master, Time: 2014-10-26_09:30:18
Comment 8 QA Administrators 2015-12-20 16:07:17 UTC Comment hidden (obsolete)
Comment 9 QA Administrators 2019-05-14 02:59:12 UTC Comment hidden (obsolete)
Comment 10 QA Administrators 2021-05-14 04:10:55 UTC Comment hidden (obsolete)
Comment 11 ⁨خالد حسني⁩ 2022-08-22 23:57:21 UTC
It is been 10 years and these combining marks are included in Unicode 14, so the original issue is no longer reproducible. I think I know what was breaking the encoded combining marks, though, so if you can make a different test with characters that are still unencoded, I might be able to finally fix it.