Created attachment 73716 [details] patch for this problem Problem description: Steps to reproduce: 1. open "Special Characters" dialog box (Insert -> Special Character...) 2. select Subset "Katakana Phonetics" 3. .... Current behavior: the cursor points at U+31C0 "CJK STROKE T". and there is no Subset "CJK Strokes". Expected behavior: the cursor points at U+31F0 "KATAKANA LETTER SMALL KU" and there should be a Subset "CJK Strokes". In ISO10636 standard, Katakana Phonetic Extensions block contains characters U+31F0 - U+31FF, and CJK Strokes block contains U+31C0 - U+31EF. c.f. http://www.unicode.org/Public/UNIDATA/Blocks.txt http://www.unicode.org/charts/PDF/U31C0.pdf http://www.unicode.org/charts/PDF/U31F0.pdf Operating System: All Version: 4.0.0.2 rc
(In reply to comment #0) > Created attachment 73716 [details] > patch for this problem > > Problem description: > > Steps to reproduce: > 1. open "Special Characters" dialog box (Insert -> Special Character...) > 2. select Subset "Katakana Phonetics" > 3. .... > > Current behavior: > the cursor points at U+31C0 "CJK STROKE T". There are some things that I see differently. 1) On Linux, using the Sazanami Gothic font, I see Subset "Katakana", but not "Katakana Phonetics". 2) And the Katakana subset points to U+30A1 (see the screenshot attached). Could you tell us on which platform you tested, which font you specified in the dialog? Thanks. > and there is no Subset "CJK Strokes". This I concur.
Created attachment 73739 [details] This is what I see
(In reply to comment #1) > 1) On Linux, using the Sazanami Gothic font, I see Subset "Katakana", but > not "Katakana Phonetics". > > 2) And the Katakana subset points to U+30A1 (see the screenshot attached). > > Could you tell us on which platform you tested, which font you specified in > the dialog? Thanks. platform: Ubuntu 12.04 LTS font: gnu unifont (http://unifoundry.com/unifont.html) I think that Sazanami gothic doesn't have Katakana Phonetic Extensions gryphs, so this subset doesn't appear in "Special Characters" dialog box. Regards,
I can only reproduce this with GNU unifont. When using e.g. XANO-mincho-U32 font, the cursor jumps to U+31F0 which I assume is correct. At this point, I'm not sure whether this is our issue, or the issue with the font itself. I'll mark this "confirmed" for the time being, but we do need to figure out whose bug this is.
Created attachment 73795 [details] screenshot of XANO mincho U32 font This is what I see what using XANO mincho U32 font.
I'll put Caolan on CC since he is our font expert.
And with IPA Gothic, IPA Mincho, and their P variants, the Katakana Phonetics subset correctly sets the cursor to U+31F0. At this point it's becoming more likely that this is a bug with GNU unifont.
OKANO Takayoshi committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=e789038a9b47d650ea4c31f30420b496109a1b54 Resolves: fdo#59922 Incorrect character range for "Katakana Phonetic" The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Caolan McNamara committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=eb1ecd8bc2936e28be852722d6cb0c9fb0baeac4 Related: fdo#59922 add new unicode blocks, detect newer in future The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
I agree with kano actually. It's a pity ICU doesn't give us a way to get the range of a block (afaics) but I've now arranged the code so that we should get a build-time warning if a new block is added that we don't know about, and a run time assert in debugging builds if the bounds of what we think is a block don't fall inside the expected unicode block according to ICU. That should hopefully avoid problems in the future if blocks get added/split again in later versions of unicode. caolanm->kano: Can you add yourself to https://wiki.documentfoundation.org/Development/Developers and send a mail like the example on that page to the libreoffice@lists.freedesktop.org stating that your patch is under our preferred MPL/LGPLv3+ dual license
Thanks Caolan for looking into this. I wouldn't have known.
> caolanm->kano: > Can you add yourself to > https://wiki.documentfoundation.org/Development/Developers and send a mail > like the example on that page to the libreoffice@lists.freedesktop.org > stating that your patch is under our preferred MPL/LGPLv3+ dual license done. Thx.