Bug 130067 - LibreOffice's Collection Character order is wrong on Korean language.
Summary: LibreOffice's Collection Character order is wrong on Korean language.
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: DaeHyun Sung
URL:
Whiteboard: target:7.0.0
Keywords:
Depends on:
Blocks: CJK-Korean
  Show dependency treegraph
 
Reported: 2020-01-18 09:28 UTC by DaeHyun Sung
Modified: 2020-05-24 09:25 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description DaeHyun Sung 2020-01-18 09:28:15 UTC
Description:

i18npool/source/collator/data/ko_charset.txt

file's Korean Hangul syllables ordering is wrong.

Some hangul syllables are dissapeared on the text file.


Hangul Syllable ordering is already specified on Unicode Code chart, Hangul Syllables Range: AC00–D7AF
https://unicode.org/charts/PDF/UAC00.pdf

Also, It is not include Korean Hangul jamo(alphabet) and some Korean Hanja lists is not in.

Steps to Reproduce:
i18npool/source/collator/data/ko_charset.txt

file's Korean Hangul syllables ordering is wrong.

Some hangul syllables are dissapeared on the text file.

Actual Results:
<가<각
<간<갇<갈<갉<갊<감<갑<값<갓<갔
<강<갖<갗<같<갚<갛<개<객<갠<갤
<갬<갭<갯<갰<갱<갸<갹<갼<걀<걋

Expected Results:
<가<각<갂<갃<간<갅<갆<갇<갈<갉
<갊<갋<갌<갍<갎<갏<감<갑<값<갓
<갔<강<갖<갗<갘<같<갚<갛<개<객
<갞<갟<갠<갡<갢<갣<갤<갥<갦<갧
<갨<갩<갪<갫<갬<갭<갮<갯<갰<갱
<갲<갳<갴<갵<갶<갷<갸<갹<갺<갻
<갼<갽<갾<갿<걀<걁<걂<걃<걄<걅
<걆<걇<걈<걉<걊<걋




Reproducible: Always


User Profile Reset: No



Additional Info:
Hangul Syllable ordering is already specified on Unicode Code chart, Hangul Syllables Range: AC00–D7AF
https://unicode.org/charts/PDF/UAC00.pdf

Also, It is not include Korean Hangul jamo(alphabet) and some Korean Hanja lists is not in.
Comment 1 Mike Kaganski 2020-01-18 10:48:20 UTC
See also https://git.libreoffice.org/core/+/2d843bb104a3091a2ff2c7b4d5655f5fb1393a47

Looks like the file is only used for ICU < 53
Comment 2 DaeHyun Sung 2020-01-18 13:32:09 UTC
I submitted the fixed Korean Hangul Syllables ordering text file.

https://gerrit.libreoffice.org/c/core/+/87018

But, It only fixed Korean Hangul Syllables range.
Hangul Syllables Range: AC00–D7AF
https://unicode.org/charts/PDF/UAC00.pdf

It is not include Korean Hangul jamo(alphabet) and some Korean Hanja lists is not in.
Comment 3 DaeHyun Sung 2020-01-18 17:57:24 UTC
(In reply to Mike Kaganski from comment #1)
> See also
> https://git.libreoffice.org/core/+/2d843bb104a3091a2ff2c7b4d5655f5fb1393a47
> 
> Looks like the file is only used for ICU < 53

I think, ICU < 53 code snippet's korean collator text file origin from KSX1001 specification.

 KS X 1001(former Specification name was KS C 5601) only support Korean syllables character, 2350 characters.

new ICU's Hangul syllables support 11172 characters, but, KS X 1001 only support 2350 characters.
(Since Unicode 2.0, Unicode and ICU can support Korean syllables, 11172 characters.

Former ICU < 53 users also use Unicode, and support 11172 characters, but, these code only support 2350 characters. 

so, In my opinion, For Korean users, It have to change the text file.
Comment 4 Eike Rathke 2020-03-02 21:06:01 UTC
But as Mike said, in builds against ICU 53 or later the file is not used anymore and we're hoping for ICU treating things correctly meanwhile. The change may make sense when building against ICU 52 or earlier. If it is to be used also with later and current ICUs then it would need additional work.

See i18npool/Library_collator_data.mk and commit message of https://gerrit.libreoffice.org/plugins/gitiles/core/+/2d843bb104a3091a2ff2c7b4d5655f5fb1393a47%5E%21/
Comment 5 Commit Notification 2020-04-07 22:55:41 UTC
DaeHyun Sung committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/b3363960f97dcb7eaa10dfa708d71198a345924c

fix Korean Hangul Syllable Character order tdf#130067

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 DaeHyun Sung 2020-05-24 09:25:27 UTC
Fixed the issues, so It resolved. 

https://git.libreoffice.org/core/commit/b3363960f97dcb7eaa10dfa708d71198a345924c

fix Korean Hangul Syllable Character order tdf#130067

It will be available in 7.0.0.