Bug 155820 - isCJKIVSCharacter needs to support CJK Unified Ideographs Extension Block C to H for Unicode15
Summary: isCJKIVSCharacter needs to support CJK Unified Ideographs Extension Block C t...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on: IVS
Blocks: CJK
  Show dependency treegraph
 
Reported: 2023-06-13 16:25 UTC by DaeHyun Sung
Modified: 2023-06-20 13:56 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description DaeHyun Sung 2023-06-13 16:25:29 UTC
Description:
isCJKIVSCharacter needs to support CJK Unified Ideographs Extension Block C to H for Unicode15

I'm curious about CJK Characters. (I'm Korean. But, I study & can speak both Japanese and Mandarin Chinese a little bit.)

After I contribute to support Unicode 15's CJK Unified Ideographs Extension H for GNOME characters, I checked CJK Unified Ideographs Extension Lists on LibreOffice.
(GNOME characters commit link: https://gitlab.gnome.org/GNOME/gnome-characters/-/commit/daef901e34d731d6d8fe8a1f966ea9f1f04e3a2f )

However, It doesn't support CJK Unified Ideographs Extension Block C to H. Only supports CJK Unified Ideographs and its Extension Block A, B.

In Unicode 15, the CJK Unified Ideographs Extension Block range is here
CJK Unified Ideographs: 4E00–9FFF
CJK Unified Ideographs Extension A: 3400–4DBF
CJK Unified Ideographs Extension B: 20000–2A6DF
CJK Unified Ideographs Extension C: 2A700–2B73F
CJK Unified Ideographs Extension D: 2B740–2B81F
CJK Unified Ideographs Extension E: 2B820–2CEAF
CJK Unified Ideographs Extension F: 2CEB0–2EBEF
CJK Unified Ideographs Extension G: 30000–3134F
CJK Unified Ideographs Extension H: 31350–323AF
Ref: https://www.unicode.org/versions/Unicode15.0.0/ch18.pdf 

I installed the IPAmj Font Character Finder on LibreOffice
Link https://extensions.libreoffice.org/en/extensions/show/1077

Japanese MJ character Information table Ver.066.01.
https://moji.or.jp/mojikiban/mjlist/

https://moji.or.jp/mojikiban/font/

IPAmj Font Character Finder can look for characters in the CJK Unified Ideographs Extension B or higher range.
such as "𫟘󠄂"(U+2B7D8) It's located in CJK Unified Ideographs Extension D.
https://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=%F0%AB%9F%98

So, For LibreOffice's CJK Users, We need to support CJK Unified Ideographs Extension Block C to H for Unicode15


Steps to Reproduce:
1. Install the IPAmj Font Character Finder extensions on LibreOffice
Link https://extensions.libreoffice.org/en/extensions/show/1077
2. input the 'MJ060164' for MJcode or '2B7D8'UCS code 
3. If the selected CJK font supports CJK Unified Ideographs Extension Block C to H, It can show. 

Actual Results:
If the selected CJK font supports CJK Unified Ideographs Extension Block C to H, It can show. 

Expected Results:
If the selected CJK font supports CJK Unified Ideographs Extension Block C to H, It can show.


Reproducible: Always


User Profile Reset: No

Additional Info:
Comment 1 V Stuart Foote 2023-06-13 17:03:13 UTC
@Khaled, *, 

Not sure what is meant by support here. 

Assume that if LO receives the Unicode, and select a font with coverage, that it renders to LO document canvas--and can save and print same.

So is this an IME question or inability to render? Does our Special Character Dialog render chart of these codepoints?

Or is this more simply against the "IPAmj Font Charactor Finder" extension [1]? 

=-ref-=
[1] https://extensions.libreoffice.org/en/extensions/show/1077
Comment 2 V Stuart Foote 2023-06-13 17:05:59 UTC
(In reply to V Stuart Foote from comment #1)
> =-ref-=
> [1] https://extensions.libreoffice.org/en/extensions/show/1077

This from the extension's page (via Google Translate)

Description
The IPAmj font is one of the fonts developed and distributed by the IPA Independent Administrative Institution Information-technology Promotion Agency.
This is a huge font set containing approximately 60,000 characters, including variant characters, used for personal names in municipalities throughout Japan.

This extension can search IPAmj fonts by various items including variant characters (IVS) and paste them into documents via the clipboard.

・The IPAmj Mincho font must be installed on the system.
- The document must be formatted with the IPAmj Mincho font.

The MJ character information list included in this extension is distributed by the following organizations, and CC-BY-SA is applied.
IPA Information-technology Promotion Agency
 https://mojikiban.ipa.go.jp/1311.html
Character Information Technology Promotion Council
 https://moji.or.jp/mojikiban/mjlist/
Comment 3 DaeHyun Sung 2023-06-19 06:47:34 UTC
I read the related source code(include/i18nutil/unicode.hxx) commit log.
https://git.libreoffice.org/core/+/c1399e497191f295b9c3db95d126ff6a4fa5891d%5E%21

```
(e.g., later versions of Unicode have added CJK Extension C--F code
blocks, which the current implementation of isCJKIVSCharacter does not reflect)
```

Currently, Unicode 15 is listed, and the Unicode IVS's characters already exist CJK Extensions C-H blocks.

On the LibreOffice source(include/i18nutil/unicode.hxx), IVS is only supported up to CJK Extension B blocks. 
However, currently, it's not just CJK Extension B, it's to Extension H from Unicode 15.

So, For LibreOffice's CJK Users, We need to support CJK Unified Ideographs Extension Block C to H for Unicode15.


This following repo 'IVS Test'(https://github.com/adobe-fonts/ivs-test )
describes.
```
This font supports all current and future CJK Unified Ideographs by covering entire blocks: U+3400 through U+4DBF (Extension A), U+4E00 through U+9FFF (URO), U+FA0E, U+FA0F, U+FA11, U+FA13, U+FA14, U+FA1F, U+FA21, U+FA23, U+FA24, U+FA27 through U+FA29 (CJK Unified Ideographs in the CJK Compatibility Ideographs block), U+20000 through U+2A6DF (Extension B), U+2A700 through U+2F7FF (Extension C, Extension D, Extension E, Extension F and beyond), U+2FA20 through U2FFFD (the end of Plane 2), and U+30000 through U+3FFFD (Extension G and the remainder of Plane 3).

```
Comment 4 DaeHyun Sung 2023-06-19 07:12:31 UTC
The below links are Unicode IVS/IVD descriptions and data links.
Description
https://www.unicode.org/reports/tr37/

data link
https://www.unicode.org/ivd/data/2022-09-13/

The Ideographic Variation Database consists of two data files. The first, IVD_Collections.txt records the registered collections. The second, IVD_Sequences.txt records the registered sequences.
https://www.unicode.org/ivd/data/2022-09-13/IVD_Collections.txt
https://www.unicode.org/ivd/data/2022-09-13/IVD_Sequences.txt

Korean IVS:
KRName collection
https://www.unicode.org/ivd/data/2022-09-13/IVD_Charts_KRName.pdf 
Japanese IVS:
Moji_Joho collection
https://www.unicode.org/ivd/data/2022-09-13/IVD_Charts_Moji_Joho.pdf


such as "𫟘󠄂"(U+2B7D8) It's located in CJK Unified Ideographs Extension D.
https://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=%F0%AB%9F%98
Also, It contains in the Moji_Joho collection document.
https://www.unicode.org/ivd/data/2022-09-13/IVD_Charts_Moji_Joho.pdf
Comment 5 ⁨خالد حسني⁩ 2023-06-19 07:16:32 UTC
To be honest, I have hard time understanding what is the issue here. Please give clear steps on how to reproduce the issue and what is the expected result, preferably attaching ODF files and screenshots if applicable.
Comment 6 TANAKA Hidemune 2023-06-20 11:57:37 UTC
@ DaeHyun Sung


1)
Is the following part correct?

They look the same.


>Actual Results: If the selected CJK font supports CJK Unified Ideographs Extension Block C to H, it can show.

>If the selected CJK font supports CJK Unified Ideographs Extension Block C to H, It can show. 


>Expected Results: If the selected CJK font supports CJK Unified Ideographs Extension Block C to H, it can show.

>If the selected CJK font supports CJK Unified Ideographs Extension Block C to H, It can show.


2)
```
sudo apt install fonts-ipamj-mincho
```

Have you performed the above?