Bug 59922 - Incorrect character range for Unicode block "Katakana Phonetic" with GNU unifont
Summary: Incorrect character range for Unicode block "Katakana Phonetic" with GNU unifont
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: UI (show other bugs)
Version:
(earliest affected)
4.0.0.2 rc
Hardware: Other All
: medium normal
Assignee: OKANO Takayoshi
URL:
Whiteboard: BSA target:4.1.0
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-27 07:30 UTC by OKANO Takayoshi
Modified: 2013-02-02 17:41 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
patch for this problem (1.93 KB, application/octet-stream)
2013-01-27 07:30 UTC, OKANO Takayoshi
Details
This is what I see (28.81 KB, image/png)
2013-01-27 18:32 UTC, Kohei Yoshida
Details
screenshot of XANO mincho U32 font (66.56 KB, image/png)
2013-01-28 20:18 UTC, Kohei Yoshida
Details

Note You need to log in before you can comment on or make changes to this bug.
Description OKANO Takayoshi 2013-01-27 07:30:16 UTC
Created attachment 73716 [details]
patch for this problem

Problem description: 

Steps to reproduce:
1. open "Special Characters" dialog box (Insert -> Special Character...)
2. select Subset "Katakana Phonetics"
3. ....

Current behavior:
the cursor points at U+31C0 "CJK STROKE T".
and there is no Subset "CJK Strokes".

Expected behavior:
the cursor points at U+31F0 "KATAKANA LETTER SMALL KU"
and there should be a Subset "CJK Strokes".

In ISO10636 standard,
Katakana Phonetic Extensions block contains characters U+31F0 - U+31FF,
and CJK Strokes block contains U+31C0 - U+31EF.

c.f.
http://www.unicode.org/Public/UNIDATA/Blocks.txt
http://www.unicode.org/charts/PDF/U31C0.pdf
http://www.unicode.org/charts/PDF/U31F0.pdf

Operating System: All
Version: 4.0.0.2 rc
Comment 1 Kohei Yoshida 2013-01-27 18:31:46 UTC
(In reply to comment #0)
> Created attachment 73716 [details]
> patch for this problem
> 
> Problem description: 
> 
> Steps to reproduce:
> 1. open "Special Characters" dialog box (Insert -> Special Character...)
> 2. select Subset "Katakana Phonetics"
> 3. ....
> 
> Current behavior:
> the cursor points at U+31C0 "CJK STROKE T".

There are some things that I see differently.

1) On Linux, using the Sazanami Gothic font, I see Subset "Katakana", but not "Katakana Phonetics".

2) And the Katakana subset points to U+30A1 (see the screenshot attached).

Could you tell us on which platform you tested, which font you specified in the dialog?  Thanks.

> and there is no Subset "CJK Strokes".

This I concur.
Comment 2 Kohei Yoshida 2013-01-27 18:32:21 UTC
Created attachment 73739 [details]
This is what I see
Comment 3 OKANO Takayoshi 2013-01-28 03:23:23 UTC
(In reply to comment #1)
> 1) On Linux, using the Sazanami Gothic font, I see Subset "Katakana", but
> not "Katakana Phonetics".
> 
> 2) And the Katakana subset points to U+30A1 (see the screenshot attached).
> 
> Could you tell us on which platform you tested, which font you specified in
> the dialog?  Thanks.

platform: Ubuntu 12.04 LTS
font: gnu unifont (http://unifoundry.com/unifont.html)

I think that Sazanami gothic doesn't have Katakana Phonetic Extensions gryphs,
so this subset doesn't appear in "Special Characters" dialog box.

Regards,
Comment 4 Kohei Yoshida 2013-01-28 20:15:48 UTC
I can only reproduce this with GNU unifont.  When using e.g. XANO-mincho-U32 font, the cursor jumps to U+31F0 which I assume is correct.

At this point, I'm not sure whether this is our issue, or the issue with the font itself.

I'll mark this "confirmed" for the time being, but we do need to figure out whose bug this is.
Comment 5 Kohei Yoshida 2013-01-28 20:18:19 UTC
Created attachment 73795 [details]
screenshot of XANO mincho U32 font

This is what I see what using XANO mincho U32 font.
Comment 6 Kohei Yoshida 2013-01-28 20:20:58 UTC
I'll put Caolan on CC since he is our font expert.
Comment 7 Kohei Yoshida 2013-01-28 20:58:04 UTC
And with IPA Gothic, IPA Mincho, and their P variants, the Katakana Phonetics subset correctly sets the cursor to U+31F0.  At this point it's becoming more likely that this is a bug with GNU unifont.
Comment 8 Not Assigned 2013-01-29 14:07:47 UTC
OKANO Takayoshi committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=e789038a9b47d650ea4c31f30420b496109a1b54

Resolves: fdo#59922 Incorrect character range for "Katakana Phonetic"



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 9 Not Assigned 2013-01-29 14:08:05 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=eb1ecd8bc2936e28be852722d6cb0c9fb0baeac4

Related: fdo#59922 add new unicode blocks, detect newer in future



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 10 Caolán McNamara 2013-01-29 14:14:03 UTC
I agree with kano actually. It's a pity ICU doesn't give us a way to get the range of a block (afaics) but I've now arranged the code so that we should get a build-time warning if a new block is added that we don't know about, and a run time assert in debugging builds if the bounds of what we think is a block don't fall inside the expected unicode block according to ICU. That should hopefully avoid problems in the future if blocks get added/split again in later versions of unicode.

caolanm->kano:
Can you add yourself to https://wiki.documentfoundation.org/Development/Developers and send a mail like the example on that page to the libreoffice@lists.freedesktop.org stating that your patch is under our preferred MPL/LGPLv3+ dual license
Comment 11 Kohei Yoshida 2013-01-30 14:42:17 UTC
Thanks Caolan for looking into this.  I wouldn't have known.
Comment 12 OKANO Takayoshi 2013-02-02 17:41:54 UTC
> caolanm->kano:
> Can you add yourself to
> https://wiki.documentfoundation.org/Development/Developers and send a mail
> like the example on that page to the libreoffice@lists.freedesktop.org
> stating that your patch is under our preferred MPL/LGPLv3+ dual license

done. Thx.