Bug 168225 - Cannot sort in Tibetan / Dzongkha alphabetical order
Summary: Cannot sort in Tibetan / Dzongkha alphabetical order
Status: NEEDINFO
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Localization (show other bugs)
Version:
(earliest affected)
25.8.1.1 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Sorting China-Minority-Scripts CTL
  Show dependency treegraph
 
Reported: 2025-09-01 09:34 UTC by Elie Roux
Modified: 2025-09-05 19:18 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
An example showing the bug (7.75 KB, application/vnd.oasis.opendocument.spreadsheet)
2025-09-01 09:34 UTC, Elie Roux
Details
screenshot of ICU app for Tibetan (77.48 KB, image/png)
2025-09-03 04:27 UTC, Elie Roux
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Elie Roux 2025-09-01 09:34:31 UTC
Created attachment 202635 [details]
An example showing the bug

In a spreadsheet, I copy/paste these three strings in the correct Tibetan / Dzongkha alphabetical order:

ཀ་
སྐ་
ས་

but when I set the document's language in Dzongkha or Tibetan and sort the data in ascending order, the result is

ཀ་
ས་
སྐ་

see attached example

CLDR has collation rules for Tibetan and Dzongkha: https://github.com/unicode-org/cldr/blob/main/common/collation/ (bo.xml and dz.xml)

LibreOffice has collation rules for Dzongkha: https://github.com/LibreOffice/core/blob/4efd03d69ac7f6ae463aa56cea6f0e80f289f6e3/i18npool/source/collator/data/dz_charset.txt

The GLibC also has implemented the rules: https://sourceware.org/bugzilla/show_bug.cgi?id=21547
Comment 1 Regina Henschel 2025-09-01 11:14:43 UTC
I think, I can't help here. It's more something for Eike or Jonathan.

I don't know, whether the expected order is the correct one. Excel and OnlyOffice sorts it in the same order as LibreOffice.
Comment 2 Elie Roux 2025-09-01 11:18:27 UTC
I can answer any question on the order if needed. I think there's only one peer-reviewed paper about the Tibetan alphabetical order (although more focused on the historical aspect): 

https://d1i1jdw69xsqx0.cloudfront.net/digitalhimalaya/collections/journals/ret/pdf/ret_63_02.pdf
Comment 3 Ming Hua 2025-09-01 14:04:45 UTC
(In reply to Elie Roux from comment #2)
> I can answer any question on the order if needed.
I don't think we have any Tibetan script expert here on Bugzilla.  So I'll ask a few questions that may seem obvious to you, but are actually hard for me as a non-user, and I hope your answer would help other QA people and developers as well.

You listed three collation rules in comment #0:

> CLDR has collation rules for Tibetan and Dzongkha:
> https://github.com/unicode-org/cldr/blob/main/common/collation/ (bo.xml and
> dz.xml)
> 
> LibreOffice has collation rules for Dzongkha:
> https://github.com/LibreOffice/core/blob/
> 4efd03d69ac7f6ae463aa56cea6f0e80f289f6e3/i18npool/source/collator/data/
> dz_charset.txt
> 
> The GLibC also has implemented the rules:
> https://sourceware.org/bugzilla/show_bug.cgi?id=21547

Are the character orders in them correct from your respective? Do they give the same collation orders?

Regarding the three strings in your example (I added their Unicode codepoints):

ཀ་ (U+0F40 U+0F0B)
སྐ་ (U+0F66 U+0F90 U+0F0B)
ས་ (U+0F66 U+0F0B)

I see that both ཀ and ས in the 30-consonant list for Tibetan (I am Chinese so it's easy for me to search for information about Tibetan, if Dzongkha is somehow different, let me know), but སྐ is not on the list, and involves the U+0F90 (SUBJOINED LETTER KA) character.  Is there some general sorting rule for strings with subjoined letters?
Comment 4 Elie Roux 2025-09-01 14:49:25 UTC
Thanks for your questions!

> Are the character orders in them correct from your respective? Do they give the same collation orders?

The rules for Dzonkha are very slightly outdated, the most up to date collation rules are:

https://github.com/unicode-org/cldr/blob/main/common/collation/bo.xml

But the differences are only in very niche cases, not on the very simple example I gave.

> Is there some general sorting rule for strings with subjoined letters?

In Tibetan there is an idea of root letter, which is the primary letter on which to sort. In the example of སྐ, the main letter is ཀ, and the superscript letter is ས. སྐ is thus organized with ཀ, not ས. 

A few resources:
- https://web.archive.org/web/20220709105007/http://www.dit.gov.bt/sites/default/files/Collation_in_Dzongkha.pdf
- https://download.mimer.com/pub/developer/charts/Chilton_slides.pdf
- https://download.mimer.com/pub/developer/charts/tibetan.htm
- http://cjc.ict.ac.cn/eng/qwjse/view.asp?id=1502
- https://doi.org/10.5070/H917135529
Comment 5 Regina Henschel 2025-09-02 21:18:54 UTC
I have found answer from Eike on a similar question in Ask:https://ask.libreoffice.org/t/calc-data-sorting-not-following-ascii-unicode-order/87503/12
The mentioned folder for "tailoring" for zh-TW contains a file for dz too.

I have put Eike
Comment 6 Regina Henschel 2025-09-02 21:20:12 UTC
.. I have put Eike in CC. He can likely tell, whether it is a bug in LibreOffice or not.
Comment 7 Elie Roux 2025-09-03 04:27:20 UTC
Yes, one way to test these tailoring / collation rules is to use the online ICU app:

https://icu4c-demos.unicode.org/icu-bin/collation.html

if you select "bo" in the list in the top left corner (which is at "und" by default), you see that it will sort 

ཀ
སྐ
ས

in the same order, I'll add a screenshot as an attachment.
Comment 8 Elie Roux 2025-09-03 04:27:50 UTC
Created attachment 202669 [details]
screenshot of ICU app for Tibetan
Comment 9 Regina Henschel 2025-09-05 18:06:48 UTC
I get the result you expect, when I do not use language "Dongkha" but "Tibetan (PR China)". Can you please test that language setting?
Comment 10 Elie Roux 2025-09-05 19:18:33 UTC
Ah yes, that works, thanks!