130067 – LibreOffice's Collection Character order is wrong on Korean language.

Bug 130067 - LibreOffice's Collection Character order is wrong on Korean language.

Summary: LibreOffice's Collection Character order is wrong on Korean language.

Status:	RESOLVED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	LibreOffice (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	All All

Importance:	medium normal
Assignee:	DaeHyun Sung

URL:
Whiteboard:	target:7.0.0
Keywords:

Depends on:
Blocks:	CJK-Korean
	Show dependency tree / graph

Reported:	2020-01-18 09:28 UTC by DaeHyun Sung
Modified:	2020-05-24 09:25 UTC (History)
CC List:	1 user (show)

See Also:
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description DaeHyun Sung 2020-01-18 09:28:15 UTC

Description:

i18npool/source/collator/data/ko_charset.txt

file's Korean Hangul syllables ordering is wrong.

Some hangul syllables are dissapeared on the text file.


Hangul Syllable ordering is already specified on Unicode Code chart, Hangul Syllables Range: AC00–D7AF
https://unicode.org/charts/PDF/UAC00.pdf

Also, It is not include Korean Hangul jamo(alphabet) and some Korean Hanja lists is not in.

Steps to Reproduce:
i18npool/source/collator/data/ko_charset.txt

file's Korean Hangul syllables ordering is wrong.

Some hangul syllables are dissapeared on the text file.

Actual Results:
<가<각
<간<갇<갈<갉<갊<감<갑<값<갓<갔
<강<갖<갗<같<갚<갛<개<객<갠<갤
<갬<갭<갯<갰<갱<갸<갹<갼<걀<걋

Expected Results:
<가<각<갂<갃<간<갅<갆<갇<갈<갉
<갊<갋<갌<갍<갎<갏<감<갑<값<갓
<갔<강<갖<갗<갘<같<갚<갛<개<객
<갞<갟<갠<갡<갢<갣<갤<갥<갦<갧
<갨<갩<갪<갫<갬<갭<갮<갯<갰<갱
<갲<갳<갴<갵<갶<갷<갸<갹<갺<갻
<갼<갽<갾<갿<걀<걁<걂<걃<걄<걅
<걆<걇<걈<걉<걊<걋




Reproducible: Always


User Profile Reset: No



Additional Info:
Hangul Syllable ordering is already specified on Unicode Code chart, Hangul Syllables Range: AC00–D7AF
https://unicode.org/charts/PDF/UAC00.pdf

Also, It is not include Korean Hangul jamo(alphabet) and some Korean Hanja lists is not in.

Comment 1 Mike Kaganski 2020-01-18 10:48:20 UTC

See also https://git.libreoffice.org/core/+/2d843bb104a3091a2ff2c7b4d5655f5fb1393a47

Looks like the file is only used for ICU < 53

Comment 2 DaeHyun Sung 2020-01-18 13:32:09 UTC

I submitted the fixed Korean Hangul Syllables ordering text file.

https://gerrit.libreoffice.org/c/core/+/87018

But, It only fixed Korean Hangul Syllables range.
Hangul Syllables Range: AC00–D7AF
https://unicode.org/charts/PDF/UAC00.pdf

It is not include Korean Hangul jamo(alphabet) and some Korean Hanja lists is not in.

Comment 3 DaeHyun Sung 2020-01-18 17:57:24 UTC

(In reply to Mike Kaganski from comment #1)
> See also
> https://git.libreoffice.org/core/+/2d843bb104a3091a2ff2c7b4d5655f5fb1393a47
> 
> Looks like the file is only used for ICU < 53

I think, ICU < 53 code snippet's korean collator text file origin from KSX1001 specification.

 KS X 1001(former Specification name was KS C 5601) only support Korean syllables character, 2350 characters.

new ICU's Hangul syllables support 11172 characters, but, KS X 1001 only support 2350 characters.
(Since Unicode 2.0, Unicode and ICU can support Korean syllables, 11172 characters.

Former ICU < 53 users also use Unicode, and support 11172 characters, but, these code only support 2350 characters. 

so, In my opinion, For Korean users, It have to change the text file.

Comment 4 Eike Rathke 2020-03-02 21:06:01 UTC

But as Mike said, in builds against ICU 53 or later the file is not used anymore and we're hoping for ICU treating things correctly meanwhile. The change may make sense when building against ICU 52 or earlier. If it is to be used also with later and current ICUs then it would need additional work.

See i18npool/Library_collator_data.mk and commit message of https://gerrit.libreoffice.org/plugins/gitiles/core/+/2d843bb104a3091a2ff2c7b4d5655f5fb1393a47%5E%21/

Comment 5 Commit Notification 2020-04-07 22:55:41 UTC

DaeHyun Sung committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/b3363960f97dcb7eaa10dfa708d71198a345924c

fix Korean Hangul Syllable Character order tdf#130067

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 6 DaeHyun Sung 2020-05-24 09:25:27 UTC

Fixed the issues, so It resolved. 

https://git.libreoffice.org/core/commit/b3363960f97dcb7eaa10dfa708d71198a345924c

fix Korean Hangul Syllable Character order tdf#130067

It will be available in 7.0.0.