Bug 133918 - CHARACTER DIALOG: Typing the name of a language in the Font tab of the Character style dialog does not work correctly sometimes
Summary: CHARACTER DIALOG: Typing the name of a language in the Font tab of the Charac...
Status: ASSIGNED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: framework (show other bugs)
Version:
(earliest affected)
5.2.7.2 release
Hardware: All All
: medium minor
Assignee: Eike Rathke
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Character-Dialog
  Show dependency treegraph
 
Reported: 2020-06-11 22:50 UTC by andvaranaut@gmail.com
Modified: 2025-03-23 08:02 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description andvaranaut@gmail.com 2020-06-11 22:50:57 UTC
Description:
I'm using LibreOffice 6.4.3.2 (latest Ubuntu stable version) in the Spanish/Spain (es-ES) locale and finding a very strange bug when changing language. I'll try to be as clear as possible in the description.

To change language you have to open the Character dialog in the Font tab (hopefully I'm guessing the names right), then there's a combobox where you can select the language.

It used to be that one could write in the first few letters of the desired language, and it would autocomplete to the first matching language in the list. For the most part, it still works, but there is a catch. If you write "in" as the first two letters, it gets changed to "id" for some reason. Since English is written "Inglés" in Spanish, this is rather annoying.

The change to "id" happens regardless of case. By experimenting I have found another weird change - "iw" gets changed to "he". 

In practical terms, this means that you can't pick English as a language when using the Spanish locale by using the keyboard to search the list. It works OK when using the mouse, though.

This is a fairly recent regression (probably on the update from Ubuntu 18.04 LTS to 20.04).

I have reproduced the bug by restarting in safe mode, as well as using LC_ALL=C to reset LO to the default English locale. The same substitutions happen.

Steps to Reproduce:
1. Select some text 
2. Right click > Character > Font tab and focus the language selector
3. Type "IN" or "IW"

Actual Results:
The "IN"/"IW" input gets changed to "id"/"he" respectively

Expected Results:
The "IN"/"IW" input remains and auto-selects the first matching language


Reproducible: Always


User Profile Reset: Yes



Additional Info:
Versión: 6.4.3.2
Id. de compilación: 1:6.4.3-0ubuntu0.20.04.1
Subprocs. CPU: 8; SO: Linux 5.4; Repres. IU: predet.; VCL: gtk3; 
Configuración regional: es-ES (es_ES.UTF-8); Idioma de IU: es-ES
Calc: threaded
Comment 1 Dieter 2020-12-10 17:12:53 UTC
Xisco, is it possible for you to check this? I don't have spanish UI?
Comment 2 andvaranaut@gmail.com 2020-12-10 17:25:30 UTC
Dieter,

The bug was also triggered with LC_ALL=C, apparently it is not locale-dependent.

Just select some text > right click > Character > Character... then click on the Language combobox entry area. Delete the contents and try to write "IN" or "IW". That triggers the bug for me.
Comment 3 Dieter 2020-12-11 07:41:39 UTC
(In reply to andvaranaut@gmail.com from comment #2)
> Dieter,
> 
> The bug was also triggered with LC_ALL=C, apparently it is not
> locale-dependent.
> 
> Just select some text > right click > Character > Character... then click on
> the Language combobox entry area. Delete the contents and try to write "IN"
> or "IW". That triggers the bug for me.

Thanks for clarification. I tested with the wrong dialog. My results
in = Indonesia (expected)
iw = he (not expected)

Expected result:
Perhaps like behaviour in Tools => Options => Language Settings => Languages: typing iw gives result Walloon (because there is no language with iw at the beginning
Comment 4 Ming Hua 2020-12-11 08:11:22 UTC
(In reply to andvaranaut@gmail.com from comment #0)
> This is a fairly recent regression (probably on the update from Ubuntu 18.04
> LTS to 20.04).

I can (mostly) reproduce with 6.2.8 on Windows:
Version: 6.2.8.2 (x64)
Build ID: f82ddfca21ebc1e222a662a32b25c0c9d20169ee
CPU threads: 2; OS: Windows 10.0; UI render: default; VCL: win; 
Locale: zh-CN (zh_CN); UI-Language: en-US
Calc: threaded

Will test earlier versions later.

> Actual Results:
> The "IN"/"IW" input gets changed to "id"/"he" respectively
> 
> Expected Results:
> The "IN"/"IW" input remains and auto-selects the first matching language

Like Dieter, when I type two letters "in" it autocompletes to "Indonesia", however if I then press Backspace key, it deletes the "donesia" part, and the text changes to "id".  Since Indonesia's ISO code (called BCP47 or something) is "id", I suspect there is some mix-up between the names and ISO codes.

For "iw" I reproduce the reported, changing to "he" behavior.
Comment 5 Ming Hua 2020-12-11 09:28:29 UTC
Also reproducible with 5.2.7 (the oldest version I have):
Version: 5.2.7.2 (x64)
Build ID: 2b7f1e640c46ceb28adf43ee075a6e8b8439ed10
CPU Threads: 2; OS Version: Windows 6.19; UI Render: default; 
Locale: zh-CN (zh_CN); Calc: group

As 5.2.7 should be older than Ubuntu 18.04 I doubt this is a regression as the reporter claimed.

It's also not limited to the Format Characters dialog, the Font tab of Paragraph Style dialog has the same problem.
Comment 6 andvaranaut@gmail.com 2020-12-11 10:12:35 UTC
Ming Hua,

While in my experience the behavior is a regression (I noticed it because something I used to be able to do did not work anymore), the underlying cause might as well not be. If your hunch regarding the confusion between codes and names is correct, then there is some chance that locale does play into it. 

I have rechecked, however, and I'm definitely seeing the IN->ID change with both es_ES and C locale (set by invoking LC_ALL=C lowriter) in my current version (6.4.6.2 from Ubuntu 20.04), even though "Indonesian" is an option in the dropdown.
Comment 7 Dieter 2022-12-11 07:42:17 UTC
Still present in

Version: 7.5.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: 52c75986adc2b370eb55ce918ab1db0a95831c83
CPU threads: 4; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: en-US (de_DE); UI: en-GB
Calc: CL threaded

Steps
1. Format -> Character -> Font tab
2. In language field type "iw"

Actual result
Change to "he"

Expected result
Perhaps like behaviour in Tools => Options => Language Settings => Languages: typing iw gives result Walloon (because there is no language with iw at the beginning
Comment 8 QA Administrators 2024-12-11 03:12:55 UTC Comment hidden (obsolete)
Comment 9 Andreas Heinisch 2025-01-20 12:38:52 UTC
It is done on purpose in https://github.com/LibreOffice/core/blame/master/svx/source/dialog/langbox.cxx#L515
Comment 10 andvaranaut@gmail.com 2025-01-20 14:53:00 UTC
(In reply to Andreas Heinisch from comment #9)
> It is done on purpose in
> https://github.com/LibreOffice/core/blame/master/svx/source/dialog/langbox.
> cxx#L515

I see. Many thanks for figuring this out.

I can see the idea behind the substitution (allowing users to choose a language by entering its BCP47 abbreviation but using always the preferred code); however, I would say that the UX experience is dismal. Besides not being easily (or at all) discoverable, changing user input in surprising ways is kind of a big no-no, particularly as text is being entered.

I would suggest not making the substitution right away, but once the user has finished entering text, meaning either on blur or on save. Given that the functionality is pretty obscure, I think that making it on save would be the best option. That would mean removing lines 518-523 and moving them to the dialog saving (I guess somewhere in SvxLanguageBox::SaveEditedAsEntry, perhaps right after the initial m_eEditedAndValid check), along with a call to LanguageTag::isValidBcp47 like the one in line 516. The current validity checks should stay as they are to give the user visual feedback regarding the validity (or not) of what is entered in the combobox, but the substitution would not happen until saving.
Comment 11 Eike Rathke 2025-01-20 16:12:54 UTC
I'm against moving the language tag canonicalization/substitution to Save because it would make it even more obscure changing the input without any visible feedback, and it is possible for the user to continue entering a more complex language tag string that if canonicalization wouldn't kick in early might be wrong. 

Also note that this bug talked about two different things, one is typing "in" that should match "Inglés" in Spanish UI and for some reason did not and was changed to "id" instead; no adhoc idea why because BCP47 language tag recognition only kicks in when no matching language list entry was found. Once language tag recognition was hit though then the old ISO 639-1 code "in" is correctly changed to "id" for Indonesian, there is no "id" code.

The other is the "iw" -> "he" Hebrew substitution that is expected for ISO 639-1 changed it and there is no "iw" code.
Comment 12 Andreas Heinisch 2025-01-20 16:25:41 UTC
> Expected result
> Perhaps like behaviour in Tools => Options => Language Settings =>
> Languages: typing iw gives result Walloon (because there is no language with
> iw at the beginning

We may stick to this behaviour?
Comment 13 andvaranaut@gmail.com 2025-01-20 17:13:56 UTC
(In reply to Eike Rathke from comment #11)

> Also note that this bug talked about two different things, one is typing
> "in" that should match "Inglés" in Spanish UI and for some reason did not
> and was changed to "id" instead; no adhoc idea why because BCP47 language
> tag recognition only kicks in when no matching language list entry was
> found. Once language tag recognition was hit though then the old ISO 639-1
> code "in" is correctly changed to "id" for Indonesian, there is no "id" code.

I can see how what you describe would solve all issues - however, the behavior you described does not match what is actually happening.

I'm not familiar with the inner workings of the text widget, but I would assume that the problem is that the BCP47 tag recognition is being triggered before checking for a matching language in the listbox, or that the text added by the listbox matching (eg. if I type "I" it completes to "Ilocano" with "locano" selected) is somehow not taken into account at some point of the process. 

I'm assuming there's no way that LanguageTag::isValidBcp47 could convert "Inglés  (Australia)" (first matching entry for "in") to "id", right? (I would expect the function to extract everything up to the first hyphen, if any, then try and match with the BCP47 codes). In that case, the only reasonable explanation is that aStr does not contain the completed language.
Comment 14 andvaranaut@gmail.com 2025-01-20 19:00:08 UTC
(In reply to Eike Rathke from comment #11)

Ok, I have played some more with the widget and have a theory, although I would need somebody more versed in the LO codebase to weigh in.

> "in" that should match "Inglés" in Spanish UI and for some reason did not
> and was changed to "id" instead; no adhoc idea why because BCP47 language

I started by checking the behavior of the language selector with English locale (LC_ALL=C) and it's the same - typing 'in' gets immediately changed to 'id' even though it should match "Indonesian" by the same logic. But if you happen to be able to type "Ind" (eg. by copying and pasting it) it does get autocompleted to "Indonesian". Meaning that the Spanish locale is not part of the problem.

I then tried to follow the overall flow of the code, but sadly I'm nowhere near proficient in C++ enough to understand where the combobox logic is defined (in particular, I would think that rControl.find_text(aStr) has to return -1 for the BCP47 substitution to happen, but I can't find the definition of find_text anywhere). 

However, that gave me an idea: What might be happening is that the modification of the text that is part of the autofilling is triggering two different edit events, one with the full text and one without. So when you enter "in" it does get autocompleted to "Indonesian", "Inglés" or whatever, but internally that triggers another change event with just "in" as the contents which then gets changed to "id".

You can reproduce a very similar behavior by doing the following:

1) Type _in in the combobox (does not exist)
2) Select the underscore
3) Type anything, even another underscore
4) Whatever you typed gets moved to the end of the string and 'in' gets again changed to 'id'. So if you typed another underscore you would see 'id_'.

To me, that implies that the substitution is not atomic - there is a moment where the selected text is deleted before being replaced with whatever you type next, and that triggers a change event where the corresponding logic sees 'in' and changes it to 'id'. Maybe the autofilling is working in a similar way?

If that is a (relatively) recent change/regression in the widget behavior, that would explain why I had the impression that the change was recent, even though the underlying code seems to have had no significant changes in 7+ years.

(PS: You can use anything else instead of an underscore. I have used the underscore because there's a surprising amount of letters which form a valid three-letter BCP47 code when followed by 'in' and didn't want the code to be valid just in case, but you can put anything you want and the same happens.)
Comment 15 Eike Rathke 2025-01-21 18:05:48 UTC
SvxLanguageBox::ChangeHdl() is called for the first character typed (here "i") and rControl.find_text(aStr) returns -1 because (anonymous namespace)::GtkInstanceComboBox::find_text() via find_text_including_mru() searches for an exact match, not start of string, then the following LanguageTag::isValidBcp47() is called which for "i" is not valid, and the function returns.

Then (anonymous namespace)::GtkInstanceComboBox::auto_complete() via idleAutoComplete() selects the first matching "Icelandic" entry and through signal SvxLanguageBox::ChangeHdl() is called again, this time GtkInstanceComboBox::find_text() finding the exact match.

Same first call of SvxLanguageBox::ChangeHdl() happens if "in" was typed, but then in that round (no exact match) LanguageTag::isValidBcp47() does the canonicalization.

There we additionally would need a partial starting match if a full match was not found and select that.
Comment 16 Eike Rathke 2025-01-30 19:20:03 UTC
Turned out to be more sophisticated.. my current stab at this partially works, but not fully, pushed to Gerrit to not lose it, maybe someone has an idea (quirks in commit message); if interested see there https://gerrit.libreoffice.org/c/core/+/180966
Comment 17 andvaranaut@gmail.com 2025-01-30 21:01:02 UTC
Many thanks for tackling the bug, sorry it's taking longer than anticipated.

Since you're asking for suggestions... I had one, but I refrained from saying it earlier because I don't really know what I was talking about :) 

In a nutshell: would it be possible to call (anonymous namespace)::GtkInstanceComboBox::auto_complete() (in other words, force an autocompletion) before attempting to perform the substitution?

The cleanest way I can see it working is forcing an autocompletion either before the rControl.find_text(aStr); call in line 474, or right after it and then fetching the value of rControl.get_active_text() again. 

Calling get_active_text() both before and after would allow you to check whether the new value is different to the previous value of aStr to know if an autocompletion has taken place, in case you need it to avoid infinite loops and such. At first glance, it probably isn't necessary - I think the worst that could happen if you don't account for this is that the validation logic would be run twice with the autocompleted text. But, if all autocompletions trigger a change event, you could just return if the new and previous values for get_active_text() differ - the new change event with the fully autocompleted text will do its thing.

This is dependent on a number of things which I'm not sure apply, most notably auto_complete being synchronous and callable in that situation. But maybe it can help you work around the issue?

The main issue I can find with the idea as explained is that I don't see it possible to enter 'in' by itself - it would always get autocompleted if the autocompletion is unconditional. But, since idle_autocomplete() can apparently distinguish whether to trigger an autocompletion or not (perhaps skipping it when the user is deleting text, or something similar), maybe the same logic can be used here.