Download it now!
Bug 134742 - [CJK Issue, Enhancement] Distinguishing both Korean and Japanese font from all CJK[Chinese-Japanese-Korean] fonts such as Noto CJK font series and Source Han Sans series, etc.
Summary: [CJK Issue, Enhancement] Distinguishing both Korean and Japanese font from al...
Status: ASSIGNED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: DaeHyun Sung
URL:
Whiteboard: target:7.1.0
Keywords:
Depends on:
Blocks: CJK CJK-Japanese CJK-Korean Font-List
  Show dependency treegraph
 
Reported: 2020-07-12 06:13 UTC by DaeHyun Sung
Modified: 2020-10-22 14:57 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description DaeHyun Sung 2020-07-12 06:13:42 UTC
Description:
[CJK Issue, Enhancement] Distinguishing both Korean and Japanese font from all CJK[Chinese-Japanese-Korean] fonts such as Noto CJK font series and Source Han Sans series, etc.

Distinguishing both Korean and Japanese fonts from all CJK[Chinese-Japanese-Korean] fonts such as Noto CJK font series and Source Han Sans series, etc.

I'm TDF Korean Contributor and can speak Korean(My mother tongue), Japanese (Little bit).

When I use Noto CJK Korean fonts such as, Noto Sans CJK KR & Noto Sans CJK KR on LibreOffice, These fonts shows “简繁” at font selection on LibreOffice.
Also, It is same situation to use Noto CJK Japanese fonts such as Noto Sans CJK JP & Noto Serif CJK JP on LibreOffice. These fonts shows “简繁” at font selection on LibreOffice.

I checked these fonts issues using debugger, I found the attemptToDisambiguateHan function bug.

below is the function link from LibreOffice core's master repository.
UScriptCode attemptToDisambiguateHan(UScriptCode eScript, OutputDevice const &rDevice)
https://git.libreoffice.org/core/+/refs/heads/master/svtools/source/misc/sampletext.cxx#1231

I found that code’s problem.
In CJK(Chinese, Japanese, Korean) use Ideographs(briefly, Chinese Characters).
Noto Sans CJK, Noto Serif CJK and Source Han Sans and Source Han Serif fonts support all Ideograph characters.

That code’s problem is don’t distinguish CJK fonts includes CJK Ideographs ranges.
ps. Trivia, CJK cultural sphere, They use for Ideographs(briefly, Chinese Characters, 漢字) writing systems. 
漢字/汉字 Hànzì is for Chinese. 한자 Hanja is for Korean. かんじ Kanji is for Japanese. 

Example1) Korean Hanja(Korean name for Chinese Character)
List of Hanja for Use in Personal Names (인명용 한자표/人名用漢字表) by the Supreme Court of ROK (Republic of Korea). 
(It includes  한문 교육용 기초 한자 (漢文敎育用基礎漢字/Hanmun Gyoyug-yong Gicho Hanja) by Ministry of Education of ROK (Republic of Korea). 
http://unicode.org/L2/L2020/20082-update-korean.pdf

Example2) Japanese Kanji(Japanese name for Chinese Character)
List of Jōyō Kanji (常用漢字表)  
https://www.bunka.go.jp/kokugo_nihongo/sisaku/joho/joho/kijun/naikaku/pdf/joyokanjihyo_20101130.pdf

So, I temporarily submitted to hard coded my source code for only Noto CJK fonts. ( Such as Noto Sans {HK, JP, KR, SC, TC} fonts.)
However, I don’t resolve the Source Han fonts. (Because, Source Han font’s Korean font shows Korean word ‘본고딕’[reads ‘bon go dik’ in Korean].)
As a result, That source code is only supported Noto CJK Fonts, such as  Noto Sans CJK, Not Serif CJK fonts.
Below is the my submitted and merged commit, "Hardcode script for "Noto" CJK fonts & add USCRIPT_JAMO” Link
https://gerrit.libreoffice.org/c/core/+/97344

I changed only the Noto CJK fonts. (It don’t support Source Han Sans & Source Han Serif fonts. Both Noto CJK Fonts and Source Han Fonts 
Before -> After
Noto Sans CJK HK 简繁 -> Noto Sans CJK HK 繁
Noto Sans CJK JP 简繁 -> Noto Sans CJK JP 日本語
Noto Sans CJK KR 简繁 -> Noto Sans CJK KR 한글
Noto Sans CJK SC 简繁 -> Noto Sans CJK SC 简
Noto Sans CJK TC 简繁 -> Noto Sans CJK TC 繁
Noto Sans Mono CJK HK 简繁 -> Noto Sans Mono CJK HK 繁
Noto Sans Mono CJK JP 简繁 -> Noto Sans Mono CJK JP 日本語
Noto Sans Mono CJK KR 简繁 -> Noto Sans Mono CJK KR 한글
Noto Sans Mono CJK SC 简繁 -> Noto Sans Mono CJK SC 简
Noto Sans Mono CJK TC 简繁 -> Noto Sans Mono CJK TC 繁
Noto Serif CJK JP 简繁 -> Noto Serif CJK JP 日本語
Noto Serif CJK KR 简繁 -> Noto Serif CJK KR 한글
Noto Serif CJK SC 简繁 -> Noto Serif CJK SC 简
Noto Serif CJK TC 简繁 -> Noto Serif CJK TC 繁



My goal is to properly show the CJK font expression on LibreOffice.

So, In my opinion, It seems necessary to discuss the display implementation of the font selection box on LibreOffice with developers from CJK Users and Developers such as, Korea, China, Taiwan, Hong Kong, Macau and Japan.

-------------
Related datas. 

Below are Noto CJK fonts and Adobe's Source Han font series articles & repositories.
Noto Sans CJK Font from Google blog.
(English) "Noto: A CJK Font That is Complete, Beautiful and Right for Your Language and Region": https://developers.googleblog.com/2014/07/noto-cjk-font-that-is-complete.html
(Korean, 한국어) "구글의 새로운 Pan-CJK (汎韓中日) 글꼴을 소개합니다": https://developers-kr.googleblog.com/2014/07/cjkfont.html
(Japanese, 日本語) "オープンソースの美しい Noto フォントファミリーに日本語、中国語、韓国語が加わりました。" : https://developers-jp.googleblog.com/2014/07/noto.html
(Chinese, 中文) "Google中日韩字体Noto Sans CJK让你的阅读体验更佳" : https://china.googleblog.com/2014/07/googlenoto-sans-cjk.html

Noto Serif CJK Font from Google blog.
(English) "Noto Serif CJK is here!" : https://developers.googleblog.com/2017/04/noto-serif-cjk-is-here.html
(Korean, 한국어) "새로운 폰트, Noto Serif CJK를 지금 바로 확인해 보세요!": https://developers-kr.googleblog.com/2017/04/noto-serif-cjk-is-here_4.html
(Japanese, 日本語) "Noto Serif CJK が登場!": https://developers-jp.googleblog.com/2017/04/noto-serif-cjk-is-here.html
(Chinese, 中文) "Noto Serif CJK 来了!" : https://china.googleblog.com/2017/04/noto-serif-cjk.html

Noto CJK font repository: https://github.com/googlefonts/noto-cjk
Below is introducing Adobe's new Source Han Sans font 
https://blog.typekit.com/2014/07/15/introducing-source-han-sans/
Below is Adobe's new Source Han Serif font series link:
https://source.typekit.com/source-han-serif/

Source Han Serif supports four different East Asian languages — Simplified Chinese, Traditional Chinese, Japanese, and Korean — and the 65,535 glyphs in each of its seven weights are designed to work together with a consistent design that emphasizes shared elements between the languages while honoring the diversity of each. Also included is a rich set of Western glyphs supporting the Latin, Greek, and Cyrillic scripts, which were derived from Source Serif.

Steps to Reproduce:
1.Select CJK fonts (includes All CJK Ideographs fonts such as Noto CJK & Source Han fonts on LibreOffice Suite
2. shows font selections language text.


Actual Results:
Noto Sans CJK HK 简繁 
Noto Sans CJK JP 简繁 
Noto Sans CJK KR 简繁 
Noto Sans CJK SC 简繁 
Noto Sans CJK TC 简繁
Noto Sans Mono CJK HK 简繁
Noto Sans Mono CJK JP 简繁
Noto Sans Mono CJK KR 简繁
Noto Sans Mono CJK SC 简繁
Noto Sans Mono CJK TC 简繁
Noto Serif CJK JP 简繁
Noto Serif CJK KR 简繁
Noto Serif CJK SC 简繁 
Noto Serif CJK TC 简繁 

Kaiti SC 简
Kaiti TC 简
Songti SC 简
Songti TC 简
Source Han Sans HC 简繁 
Source Han Sans HW HC 简繁 
Source Han Sans HW SC 简繁 
Source Han Sans HW TC 简繁 
Source Han Sans SC 简繁 
Source Han Sans TC 简繁 
본고딕
본고딕 HW 


Expected Results:
Noto Sans CJK HK 繁
Noto Sans CJK JP 日本語
Noto Sans CJK KR 한글
Noto Sans CJK SC 简
Noto Sans CJK TC 繁
Noto Sans Mono CJK HK 繁
Noto Sans Mono CJK JP 日本語
Noto Sans Mono CJK KR 한글
Noto Sans Mono CJK SC 简
Noto Sans Mono CJK TC 繁
Noto Serif CJK JP 日本語
Noto Serif CJK KR 한글
Noto Serif CJK SC 简
Noto Serif CJK TC 繁

Kaiti SC 简
Kaiti TC 繁
Songti SC 简
Songti TC 繁
Source Han Sans HC 繁
Source Han Sans HW HC 繁
Source Han Sans HW SC 简
Source Han Sans HW TC 繁
Source Han Sans SC 简
Source Han Sans TC 繁
본고딕 한글
본고딕 HW 한글


Reproducible: Always


User Profile Reset: No



Additional Info:
UNICODE HAN DATABASE (UNIHAN) 
https://www.unicode.org/reports/tr38/
Comment 1 V Stuart Foote 2020-07-12 16:52:53 UTC
(In reply to DaeHyun Sung from comment #0)
>... 
> So, In my opinion, It seems necessary to discuss the display implementation
> of the font selection box on LibreOffice with developers from CJK Users and
> Developers such as, Korea, China, Taiwan, Hong Kong, Macau and Japan.
> 
>...

Well sure, but realistically the current logic for composing the "sample text" strings for the font name list box--makeShortRepresentativeTextForScript()--is kind of limited and responds only to the fonts reported Unicode coverage (or as it is hardcoded against its font name).

With your prototype handling of Noto you've taken its reported CJK script support, and then parsed the font's name for a string ("KR", "JP", "SC", "TC", "HK") indicating its target script to assign the sample text shown in the font listings.

But as you've found there are some weaknesses to this approach. First that you are dependent on what the font reports for its script coverage. But also you are dependent on what the font designer actually named the font/font family!

The mechanism is fragile--font names change, reported script coverage is too broad or is misreported. Import filters pass through questionable font names. For CJK alone there would need to be a lot of fontname "strings" that would have to be maintained.

So parsing the fontname for language script does not go far enough. In addition to responding to reported script support the fontlist and combobox selection should respond to the CJK locale--of the user's UI, or of the target paragraph.  

On the other hand the combobox holding the font list performs multiple roles, general font selection and display of formatting/style but also handling of Graphite/OTF smart font features.

IIUC in addition to simply displaying the localized font labeling of the "ShortRepresentativeText" for the Unified CJK font families, the listbox strings will need a refactoring: 1) respond to user's locale selection, 2) respond to language as set for the paragraph with text cursor focus, 3) indicate usage of GSUB locl, or the CJK focused tags [1][2].

Kind of see localized labeling of the font name, in parallel to bug 35538 for rendering styles. It all would need a major refactoring. Requires dev work--maybe a good GSOC project?

=-ref-=
[1] https://en.wikipedia.org/wiki/List_of_typographic_features#Features_primarily_intended_for_or_exclusively_required_by_East-Asian_tetragrams_(Chinese,_Japanese,_Korean)

[2] https://wiki.documentfoundation.org/Smart_font_optional_features_for_Graphite_and_OpenType_fonts#List_of_optional_smart_font_features_and_their_ID
Comment 2 DaeHyun Sung 2020-08-01 06:26:25 UTC
I think that change the code and improving the ability to distinguish fonts between Korean, Chinese and Japanese.

1. remove  Hardcode script for "Noto" CJK fonts
2. add hardcode script at attemptToDisambiguateHan(UScriptCode eScript, OutputDevice const &rDevice) and change distinguish among Korean, Japanese and Chinese fonts.

Former
-            static const sal_Unicode aKorean[] = { 0x3131 };
-            static const sal_Unicode aJapanese[] = { 0x3007, 0x9F9D };
-            static const sal_Unicode aTraditionalChinese[] = { 0x570B };
-            static const sal_Unicode aSimplifiedChinese[] = { 0x56FD };
Korean: U+3131 ㄱ   Hangul Letter Kiyeok
Japanese: U+3007 〇 Ideographic Number Zero & U+9F9D 龝
Traditional Chinese: U+570B 國
Simplified Chinese: U+56FD  国

That code’s problem
Both Japaese kanji U+3007 〇 and  U+9F9D 龝 also uses in Korean & Chinese.

U+3007 〇 
Definition: zero
It uses in CJK(Chinese, Japanese and Korean) 
It usually uses number expression in MS Excel, LibreOffice.

U+9F9D 龝 
Definition: autumn, fall; year
Mandarin Chinese reads qiū 
Korean Hanja sound is 추 chu 
Japanese Kun sound is ‘AKI' or ‘TOKI’ 
Japanese On sound is ‘SHUU’
That meaning likes ‘秋’.

Korean
[한자 너 어디 있었니?] 54. 분탕 焚蕩 http://www.incheonilbo.com/news/articleView.html?idxno=1019040
참고로 가을날 벼에 달라붙은 메뚜기 모양을 한 글자인 龝(추)는 秋의 고자(古字)로 서예가들이 멋을 부리기 위해 사용하기도 한다.
Japanese
「龝」の漢字‐読み方・意味・部首・画数 - 漢字辞典 https://kanjitisiki.com/jis2/2-3/020.html
漢字の「龝」についてです。「秋」の異体字です。
Chinese
龝 - 中國哲學書電子化計劃 https://ctext.org/dictionary.pl?if=gb&char=%E9%BE%9D
《康熙字典·四》:    秋:〔古文〕𥤛𪚼龝𪔁《唐韻》七由切《集韻》《韻會》雌由切《正韻》此由切,𠀤音鰌。

Also, Both U+570B 國 and U+56FD 国 doesn't distinguish CJK languages.
Because, 'U+570B 國’ uses in Traditional Chinese,  Korean, Japanese texts.
U+570B 國
Korean: 國
21國 정상급 26명 온다…평창서 `외교 올림픽` https://www.mk.co.kr/news/politics/view/2018/01/66693/
핵융합발전 프로젝트 韓國이 주도..."ITER 부품의 70~80% 도맡아" http://www.dt.co.kr/contents.html?article_no=2020072802109931731004 
Japanese: 國
ORANGE RANGE、母校の吹奏楽部・琉球國祭り太鼓とのライブを公開 https://news.yahoo.co.jp/articles/c6a7e9bb83e46662a8638cd5373a5c71d144cb8b
Traditional Chinese: 國
國家森林遊樂區免費入園一次 上路一週最熱門是這地方   https://news.ltn.com.tw/news/life/breakingnews/3237355

Also, 'U+56FD 国’ uses in both Simplified Chinese and Japanese.
U+56FD 国
Japanese: 国
日本人の子ども連れ去りは国ぐるみの誘拐? 批准した国際条約、国内で適用せずは許されるのか https://www.47news.jp/news/5057377.html
Simplified Chinese: 国
中国国际云书馆上线运行 http://world.people.com.cn/n1/2020/0726/c1002-31797808.html
Comment 3 Commit Notification 2020-10-12 07:43:47 UTC
DaeHyun Sung committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/c8e8860f8b1453f0a51c6202ce8ff90b7c4ba515

tdf#134742 Distinguishing all CJK fonts such as Noto CJK Fonts.

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.