161744 – Implement Cyrillic Mongolian numbering for headings

Bug 161744 - Implement Cyrillic Mongolian numbering for headings

Summary: Implement Cyrillic Mongolian numbering for headings

Status:	NEW

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Localization (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	Heading-Numbering-Dialog
	Show dependency tree / graph

Reported:	2024-06-22 16:42 UTC by Fiable.biz
Modified:	2024-07-18 20:21 UTC (History)
CC List:	7 users (show)

See Also:	115189
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Fiable.biz 2024-06-22 16:42:50 UTC

Several Cyrillic alphabets are available for numbering headers in Tools→"Heading numbering"→Numbering→"Number:", but Cyrillic Mongolian is not there. The Mongolian Cyrillic alphabet is there: https://en.wikipedia.org/wiki/Mongolian_Cyrillic_alphabet#Description . It's the official alphabet of (outer) Mongolia. I guess it would not be difficult to add it.

An alternative would be to allow the user to define an alphabet he/she wants to use for such a purpose.

(See also bug#115189 for the Mongol-Uyghur alphabet)

Comment 1 Fiable.biz 2024-07-07 14:57:19 UTC

QA administrators wrote: "Whiteboard: QA:needsComment". I don't understand why you need comments and what kind of comments you need. As an example of the use of Mongolian Cyrillic alphabet for numbering, see pages iii and iv of document https://www.osce.org/files/f/documents/1/7/454269.pdf on the OSCE (Organization for security and cooperation in Europe) website. Mongolia is a member of the OSCE.

Comment 2 V Stuart Foote 2024-07-07 15:34:52 UTC

(In reply to Fiable.biz from comment #0)
> 
> An alternative would be to allow the user to define an alphabet he/she wants
> to use for such a purpose.
> 
> (See also bug#115189 for the Mongol-Uyghur alphabet)

Already can achieve this in UI of Heading-Numbering dialog, just not with the convenience of a predefined alphabet sequence.

1. Customize via dialog from the 'Unordered' or the 'Ordered' list split button
2. in Bullets and Numbering... dialog, 'Customize' tab
3. in 'Number' listbox, select "Bullet"
4. in 'Character style' list box, select "Numbering Symbols"
5. use the 'Select' button to launch the Special Character Dialog, pick a font with coverage of Cyrillic
6. for each level, select the character you'd like to use
7. then save as a template document for reuse (or you'll be entering it again)

Otherwise, seems the Mongolian Cyrillic alphabetization list sequence could be a useful enhancement. But should also include a comparable list sequence in the Unicode Mongolian block U+1800-U+18AF, not the full sort logic that bug 115189 requires.

@Vort, @Justin -- thoughts?

Comment 3 V Stuart Foote 2024-07-07 15:38:59 UTC

s/@Vort/@Volga sorry...

Comment 4 Fiable.biz 2024-07-08 02:13:45 UTC

(In reply to V Stuart Foote from comment #2)

> 1. Customize via dialog from the 'Unordered' or the 'Ordered' list split
> button
> 2. in Bullets and Numbering... dialog, 'Customize' tab
(...)
> 6. for each level, select the character you'd like to use

Thank you but this just provides for one character per level, while the aim is to use the alphabet to "number" elements of a list, or headings of a one level: "а." for the first element, "б." for the 2nd one etc., as in the example I gave above.

Comment 5 Heiko Tietze 2024-07-08 08:07:41 UTC

We list four types of Cyrillic numbers for Bulgarian, Russian, Serbian, and Ukrainian. While I support adding Mongolian, the list is hard to use right now. How about two entries for letters, one "A,B,C (Latin)" the other "A,B,C (Localized)" which then becomes either Russian Cyrillic or Mongolian, depending on the chosen language.

Or, to keep the list more clean, add an option below "[ ] Localize".

Comment 6 Mike Kaganski 2024-07-08 09:29:35 UTC

(In reply to Heiko Tietze from comment #5)

The dropdown is really getting unmanageable. On the other hand, we definitely need more entries available. We already had this issue elsewhere (I remember a discussion with erAck, something around the "native numbering"); and of course, it will return again, unless we provide a good solution. Note that we can't simply rely on "current language's native numbering", both because there are legitimate cases of using a numbering "native" to another language here, and because there may be more than one "native" numbering scheme for a language.

The drop-down has an upside of being very convenient for the often-used elements. But we need a dialog for the full range of options. I don't know if it's best to use the dialog instead of the drop-down unconditionally, because it would be unified; or to have a drop-down with a handful most-used options, plus a "more..." element opening the dialog.

Comment 7 Fiable.biz 2024-07-08 10:35:56 UTC

(In reply to Mike Kaganski from comment #6)
> to have a drop-down with a handful most-used
> options, plus a "more..." element opening the dialog.

This option seems the more convenient for me. In practise, most people use one or very few languages, and few numbering styles. I would go for adding automatically in the "handful most-used options" any element once used by the user, and for removing automatically any element not used (i.e. not affected to any heading level) for one year or more, except if there are no more than, let say, 7 elements left, in which case the too old numbering style would be automatically replaced by one of a default short list (most common language-independent "numbering" plus the default language alphabet). This way, a user who hasn't use any numbering style for one year would automatically get the default list back. The user should also been enabled to add or remove elements manually in its list of most-used numberings.
   The following is out of scope here, but the very long list of languages of the paragraph and character styles, when editing such styles: "Font → Language", is also difficult to manage. I'd like to get quickly the choice between the few languages I use.

Comment 8 Heiko Tietze 2024-07-09 10:42:17 UTC

(In reply to Mike Kaganski from comment #6)
> Note that we can't simply rely on "current language's native
> numbering", both because there are legitimate cases of using a numbering
> "native" to another language here, and because there may be more than one
> "native" numbering scheme for a language.

What is wrong with "A,B,C (Latin)" becoming "А,Б,В (Native)" when the paragraph language is set to Russian or "ᠠ,ᠡ,ᠢ" in case of Mongolian? If you want the Cyrillic alphabet it should be possible to switch to another language. I struggle a bit with the idea of an extra dialog that offers more but all just hard-coded number schemes.

Comment 9 Mike Kaganski 2024-07-09 11:09:56 UTC

(In reply to Heiko Tietze from comment #8)
> What is wrong with "A,B,C (Latin)" becoming "А,Б,В (Native)" when the
> paragraph language is set to Russian or "ᠠ,ᠡ,ᠢ" in case of Mongolian? If you
> want the Cyrillic alphabet it should be possible to switch to another
> language.

One may need to use e.g. Mongolian language, and A-B-C numbering, as well as a-b-c, or i-ii-iii, or Arabic, if needed, not only Mongolian.

> I struggle a bit with the idea of an extra dialog that offers more
> but all just hard-coded number schemes.

It is a dangerous attitude. I can understand a desire to keep the prominent parts of the UI clean. But the idea that we must make some functionality simply inaccessible at all (except API) is really hurting. I don't propose any complexity 
in the default-visible things. But elements like "More..." should be introduced as many as needed to provide the access to advanced features to whoever need them. Basic users may limit themselves to basic UI.

Comment 10 Heiko Tietze 2024-07-09 12:20:09 UTC

(In reply to Mike Kaganski from comment #9)
> One may need to use e.g. Mongolian language, and A-B-C numbering, as well as
> a-b-c, or i-ii-iii, or Arabic, if needed, not only Mongolian.
The Latin variants would be available too, of course. The only shortcoming I see is Mongolian text with non-Latin and non-Mongolian numbers. And for this purpose I suggest an attribute on the numbers, eg. Numbering Symbols with Russian language. Admittedly a bit difficult to find. 

> > I struggle a bit with the idea...
> It is a dangerous attitude.
In other words I am not strictly against your workflow but if there is something more clean and suited well for everybody... We are still pondering.

Comment 11 Mike Kaganski 2024-07-09 12:27:59 UTC

(In reply to Heiko Tietze from comment #10)
> (In reply to Mike Kaganski from comment #9)
> > One may need to use e.g. Mongolian language, and A-B-C numbering, as well as
> > a-b-c, or i-ii-iii, or Arabic, if needed, not only Mongolian.
> The Latin variants would be available too, of course. The only shortcoming I
> see is Mongolian text with non-Latin and non-Mongolian numbers. And for this
> purpose I suggest an attribute on the numbers, eg. Numbering Symbols with
> Russian language. Admittedly a bit difficult to find. 

I suppose, that in order to implement your idea, we could convert the current drop-down into something similar to what we have in Calc's number format. We would need a list of available formats, plus a selector for *locale*. Then one would be able to select the locale, and see the numbering formats available for it (or use "Automatic" locale to allow it update depending on the text language).

But note that we should do that everywhere where we have the numbering selector: lists, page numbering, outline/chapter/heading numbering, fields, maybe somewhere else. Or we could still make this advanced selector available as a separate dialog, activated by the "More..." / "Advanced..." item in the currently available selectors.

Comment 12 Fiable.biz 2024-07-09 13:58:44 UTC

Let me clarify a bit. Mongolian has been historically written with 10 different kind of characters, and is still currently written with 3 different alphabets: Cyrillic in outer Mongolia, Mongol-Uyghur (a vertical script) in Inner Mongolia and a bit in outer Mongolia, where it is taught to all middle schoolers, and Latin in many SMSs, though this is usage, still widely followed, is decreasing. It's not the only language written with different scripts: Chinese and Serbian are too. Mongol-Uyghur has its own digits. Outer Mongolians also use the Roman numbering with the meaning of an ordinal number ("III хурал" means "3rd meeting"), though they also use "-р" (the abbreviation of "дугаар"/"дүгээр", according to the number, to follow the "vocalic harmony") for the same purpose ("Гуравдугаар хурал" or "3 дугаар хурал" mean "Third meeting" and "3-р хурал" means "3rd meeting"), and "№" is used for "number" ("Хурал №3" means "Meeting number 3"). All these except the last one are frequently used to number list or summary items. There are different Cyrillic alphabets, the Mongolian one having 2 more letters than the Russian one.
Arabic and Roman numerals, as well as Latin alphabet have a special role in this planet, being used much more widely than in Arabic and Latin texts. ;-)
Calc's number format dialogue is quite good, including a "user-defined" category, but the dialogue depends on a too long list of languages/scripts, while I'd like to access more quickly the few ones I use.

Comment 13 V Stuart Foote 2024-07-09 15:23:07 UTC

(In reply to Heiko Tietze from comment #5)
> We list four types of Cyrillic numbers for Bulgarian, Russian, Serbian, and
> Ukrainian. While I support adding Mongolian, the list is hard to use right
> now.

Does our VCL layout even support non-Cyrillic Mongolian? 
 
> How about two entries for letters, one "A,B,C (Latin)" the other "A,B,C
> (Localized)" which then becomes either Russian Cyrillic or Mongolian,
> depending on the chosen language.

In reality the Soviet era adoption of Cyrillic based alphabets was quite broad: https://en.wikipedia.org/wiki/Cyrillic_alphabets so not just here.

So for any of the Cyrillic using locales we support with an i18n CLDR record we could/should hard code both numeric and alphabetic sequence for each locale, expanding the drop list.


Otherwise IIUC, Between Mongolia and the Inner Mongolia (the Autonomous Region of the PRC) we have two different alphabets and multiple 'dialects' of Mongolian.

And only the state of Mongolia officially uses the Cyrillic forms. But that is being supplemented now, with Mongolia *officially* adopting the traditional forms (so somehow adopting use of Unicode 1800-18AF) and preparing documents in both renderings mandated by Jan 2025.

> 
> Or, to keep the list more clean, add an option below "[ ] Localize".

Regards doing a "[ ] Localize"--wouldn't the CLDR details be found in the LC_COLLATION  and LC_INDEX record?

Here we don't maintain a mn_MN or mn_CN locale, just the mn_Cyrl_MN.xml. 

Though I noticed we don't update select CLDR automatically and the locale/lang codes are odd reusing generic 'qlt' "a locale language" for many in addition to the country codes. Meaning we can currently automate?

Also, Unicode is struggling with the "Mongolian text model" https://www.unicode.org/reports/tr54/
https://www.unicode.org/mwg/mwg3docs/mwg3-2UnicodeV12MongolianBlockR.pdf

Think Eike is going to need to comment/participate here on how/if this could be automated and what would need be done in our i18n CLDR data stores.

Comment 14 Heiko Tietze 2024-07-18 09:58:01 UTC

We briefly discussed the topic in the design meeting.

To repeat my idea first: We move Bulgarian, Serbian, and Russian out of the pre-defined list into an extra language dropdown that offers languages for all possible character sets; ideally pre-selected by the locale defined in tools > options.

However, this ticket is specifically about Mongolian. And we suggest to support this request with some higher priority than a redesign requires.

Comment 15 Eike Rathke 2024-07-18 12:16:40 UTC

(In reply to V Stuart Foote from comment #13)
> (In reply to Heiko Tietze from comment #5)
> > Or, to keep the list more clean, add an option below "[ ] Localize".
> 
> Regards doing a "[ ] Localize"--wouldn't the CLDR details be found in the
> LC_COLLATION  and LC_INDEX record?
No. Collation is sorting/ordering and may follow the standard Unicode collation order (in fact our mn-Cyrl-MN locale data simply refers <LC_COLLATION ref="en_US"/>), while alphabetic numbering may be defined differently. LC_INDEX is close, but may differ from numbering and also is not sufficient, as we need a persistent unique identifier to store the numbering in ODF, that does not change if data for some reason changes. That can be auto-generated for distinct numbering sequences using their first three letters, but (not only) for Cyrillic is locale dependant if the first three letters are equal among some locales but other letters are not. See for example the Bulgarian, Russian, Serbian, Ukrainian cases in https://opengrok.libreoffice.org/xref/core/i18npool/source/defaultnumberingprovider/defaultnumberingprovider.cxx?r=be1c9ee5#156 and their identifiers in the table at https://opengrok.libreoffice.org/xref/core/i18npool/source/defaultnumberingprovider/defaultnumberingprovider.cxx?r=be1c9ee5#1132 that DefaultNumberingProvider::makeNumberingIdentifier() https://opengrok.libreoffice.org/xref/core/i18npool/source/defaultnumberingprovider/defaultnumberingprovider.cxx?r=be1c9ee5#1160 uses.


> the
> locale/lang codes are odd reusing generic 'qlt' "a locale language" for many
> in addition to the country codes.
That is a convention needed for the UNO API to be able to still transport a full BCP 47 language tag in the fixed inflexible Java Locale struct that has only Language, Country and Variant fields, where Language an Country can only hold the ISO codes. If a language tag does not consist of purely language and country alphabetic ISO codes, then we use the 'qlt' "Reserved for local use" language code (mnemonic Q Language Tag) and the actual language tag is in the Variant field, Country may be filled if it fits. Here for Mongolian Cyrillic it's
Language = "qlt"
Country = "MN"
Variant = "mn-Cyrl-MN"

The 'qlt' never faces document storage.

> Meaning we can currently automate?
What do you mean?


> Think Eike is going to need to comment/participate here on how/if this could
> be automated and what would need be done in our i18n CLDR data stores.
Please do not confuse Unicode CLDR and our LO locale data. Initially OOo contributed its locale data to CLDR and now we may align our data if CLDR changes are desired, but don't do it for every case.

Comment 16 V Stuart Foote 2024-07-18 20:21:42 UTC

(In reply to Eike Rathke from comment #15)
> ...
> > Meaning we can currently automate?
> What do you mean?
> 

Just that could we synthesize *any* language/locale we'd be handling? Where I'd assumed the source to do so would be Unicode CLDR. 

But seems it requires a dev action to create each locale in our LO i18n records. And so can not currently be automated. 

> 
> > Think Eike is going to need to comment/participate here on how/if this could
> > be automated and what would need be done in our i18n CLDR data stores.
> Please do not confuse Unicode CLDR and our LO locale data. Initially OOo
> contributed its locale data to CLDR and now we may align our data if CLDR
> changes are desired, but don't do it for every case.

OK. But if we were to pull directly from Unicode CLDR (dynamically), are the needed details there to work against--per locale? Or is our LO i18n locale data heavily customized to meet our project l10n needs? And LO requirements can't be derived other than by manual curation and developer actions?