Bug 81714 - Other: The new "Language Tag" feature in LO 4.3 works only with non-CTL/CJK scripts
Summary: Other: The new "Language Tag" feature in LO 4.3 works only with non-CTL/CJK s...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Localization (show other bugs)
Version:
(earliest affected)
4.3.0.3 rc
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords:
Depends on:
Blocks: RTL-CTL
  Show dependency treegraph
 
Reported: 2014-07-24 12:05 UTC by EricP
Modified: 2023-02-15 06:38 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
ZIP containing test ODT and screenshots from LOv4304. (145.85 KB, application/zip)
2014-08-24 08:43 UTC, Owen Genat (retired)
Details
Screenshots demonstrating red underlines on Mnong text marked as Khmer (132.24 KB, application/zip)
2014-08-25 08:18 UTC, EricP
Details

Note You need to log in before you can comment on or make changes to this bug.
Description EricP 2014-07-24 12:05:53 UTC
The new "Language Tag" feature introduced in LibreOffice 4.3 does not work with CTL scripts.

For example, Central Mnong [cmo] is written in Vietnam with a Latin script and in Cambodia with a Khmer-based script.

I can mark a selection of text in Latin script as cmo-Latn-VN and it appears to work fine. An attempt to mark a section of text in Khmer script as cmo-Khmr-KH has no effect. Instead it remains stubbornly marked as "Khmer."

And the release notes may indicate that this feature does not yet support CTL languages, though it's somewhat ambiguous.
https://wiki.documentfoundation.org/ReleaseNotes/4.3#Adding_a_new_language_tag



              
Operating System: All
Version: 4.3.0.3 rc
Comment 1 Owen Genat (retired) 2014-08-24 08:43:58 UTC
Created attachment 105187 [details]
ZIP containing test ODT and screenshots from LOv4304.

Confirmed under GNU/Linux using:

- v4.3.0.4 Build ID: 62ad5818884a2fc2e5780dd45466868d41009ec0
- v4.4.0.0.alpha0+ Build ID: e379401618268ed7f7f5885a36b90e1f4f6cd4af TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-08-18_05:51:03

As I understand this report, the BCP47 language tag (cmo-Khmr-KH) for Khmer script is not reflected in the status bar for the current selection (shows "Khmer" instead). For Latin script marked "cmo-Latn-VN", this is reflected in the status bar. Refer attached screenshots.

The SIL Khmer Mondulkiri font used in the sample is available at:

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=mondulkiri
Comment 2 Owen Genat (retired) 2014-08-24 08:45:24 UTC
As per comment 1, status set to NEW.
Comment 3 EricP 2014-08-25 08:16:30 UTC
Yes Owen, you understood my report correctly, and encouraged me to investigate things a bit further (OS X 10.9.4 / LO 4.3.04).

It appears that for any text written in a CTL script, the status bar always displays the name of whichever language is selected in the CTL Font section of the Character Format dialog.

So, if a section of text in Khmer script is marked as cmo-Khmr-KH, but the  CTL Font is specified as Arabic, then the status bar indicates “Arabic”.

It does not seem that this is simply an issue with the status bar. If a Khmer spell checker is installed, then it will show red underlines beneath Central Mnong text marked as cmo-Khmr-KH but with Khmer selected as the CTL Font (Khmer.png). If, however, Arabic (for which no spelling dictionary is installed) is selected, then red underlines are not displayed (Arabic.png).
Comment 4 EricP 2014-08-25 08:18:32 UTC
Created attachment 105223 [details]
Screenshots demonstrating red underlines on Mnong text marked as Khmer
Comment 5 martin_hosken 2014-11-07 09:30:18 UTC
I think what is needed is to change cui/source/inc/chardlg.hxx such that

  SvxLanguageBox*     m_pEastFontLanguageLB;
  SvxLanguageBox*     m_pCTLFontLanguageLB;

become:

  SvxLanguageComboBox*     m_pEastFontLanguageLB;
  SvxLanguageComboBox*     m_pCTLFontLanguageLB;
Comment 6 Eike Rathke 2014-11-07 12:09:03 UTC
There is no way yet to flag an arbitrary language tag to be CTL or CJK. The existing predefined CTL/CJK tags respectively their corresponding LCID values occur in various switch cases to be acted differently upon. Merely changing the mentioned language boxes to SvxLanguageComboBox will not help. This needs further implementation.
Comment 7 martin_hosken 2016-10-21 14:07:52 UTC
Where is the flagging needed?

I don't see anywhere in the code that uses MSLangId to decide whether text is CTL or not. All calls to getScriptType seem to be to break iterators now, and they all use Unicode code points instead. Of course if we did have such instances, perhaps we could move the call over to the languageTag instead and then we could use the script component and give a useful answer there too.
Comment 8 Eyal Rozenberg 2018-09-30 21:41:41 UTC
>  mark ... cmo-Latn-VN ... cmo-Khmr-KH ...

None of these strings are options available in the "Language" combo box (as of v6.2.0.0-alpha). How are we supposed to (try and) make these markings?
Comment 9 EricP 2018-10-01 08:17:07 UTC
(In reply to Eyal Rozenberg from comment #8) 
> None of these strings are options available in the "Language" combo box (as
> of v6.2.0.0-alpha). How are we supposed to (try and) make these markings?

You should be able to type directly into the "Language" combo box, as shown at 
https://wiki.documentfoundation.org/ReleaseNotes/4.3#Adding_a_new_language_tag.

This bug still exists in LO v6.0.
Comment 10 Eyal Rozenberg 2023-02-10 15:32:29 UTC
Not experiencing this with:

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: ad387d5b984c6666906505d25685065f710ed55d
CPU threads: 4; OS: Linux 6.1; UI render: default; VCL: gtk3
Locale: fa-IR (en_IL); UI: en-US

What I did:

1. Created a new document
2. Typed in some Hebrew text
3. Selected part of the text
4. Chose to Format | Character... on the menu
5. Typed in "Arabic (Palestine)"
6. Accepted
7. Moved around the line I had typed with the cursor

The area I had marked as "Arabic (Palestine)" showed up as such on the toolbar, and the rest showed as Hebrew.

OP, can you provide up-to-date reproduction instructions? If not, is it possible that this is a CJK-only issue? Or that it has been fixed at some point?
Comment 11 Eike Rathke 2023-02-13 18:48:25 UTC
Arabic (Palestine) {ar-PS} is available as predefined language tag and categorized as CTL already anyway. That ~always worked. To try the actual case here type in the language tag example of comment 0: cmo-Khmr-KH

Anyhow, meanwhile things have changed, all three (Western,CTL,CJK) language lists have comboboxes and since 7.5 entering a language tag adds it to the proper list (which actually may mean it's added to another list than entered and thus can be confusing, but shrug); this is a by-product of the follow-ups of changes for 7.5 to permanently add language tags as noted in https://wiki.documentfoundation.org/ReleaseNotes/7.5#Language_tags

So once cmo-Khmr-KH was typed into any of those boxes and the dialog closed with OK, when reopening it appears in the CTL list as
"cmo (Khmer, Cambodia) {cmo-Khmr-KH}"
(apparently current ICU has no language word for 'cmo')
because Khmr is a CTL script as determined via the ICU Unicode properties and our classification table.

I guess we can close this as fixed?
Comment 12 EricP 2023-02-15 06:38:50 UTC
Yes, I think this is fixed. 
It was actually working two months ago in 7.4 already but I think the changes in 7.5 make it easier.

Am I correct to assume that Mnong [cmo] needs to be added to the CLDR in order for LO to know its name and default script? Mnong is actually written with either Latin or Khmer script, but I tried another language which is only written in Khmer script (tpu) and was disappointed to see LO add it as a Western text language. It was only when I specified the script code with tpu-Khmr that LO correctly added it to the CTL list. If the default script code is specified in CLDR then will LO automatically assign it to the correct list?