LibreOffice correctly detects CTL text and sets the text language of the text to whatever is set in the complex text language drop down listbox in Tools > Options > Language Settings > Languages (by default it is hindi). The problem with this is that i could be writing in multiple CTL languages in a sentence and falling back on a single set CTL language isnt useful. I believe that it is possible to detect the user's keyboard layout and if so, why not use that to set the text language accurately.
I assume this same principle could be used for other languages as well.
IIRC we support this under Windows because the IM there has a property to indicate the language the IM is for, while under Linux we don't cause it doesn't.
e.g. WinSalFrame::GetInputLanguage for the windows one which has the feature vs GtkSalFrame::GetInputLanguage which can only return LANGUAGE_DONTKNOW
I think this is a duplicate of bug 108151
(In reply to Caolán McNamara from comment #1)
> IIRC we support this under Windows because the IM there has a property to
> indicate the language the IM is for, while under Linux we don't cause it
So with this mechanism not available on Linux, would it be possible to use your libexttextcat library to detect the language and change accordingly? Or alternatively add on to the current CTL detection, and detect CTL languages based on the unicode character range being typed?
I imagine using libexttextcat would just introduce a pile of "my language was guessed wrong" bugs. Especially for short sequences of text which won't be long enough for the statistical efforts of libexttextcat to guess it right.
Unicode char range folds this bunch of languages https://en.wikipedia.org/wiki/Arabic_script#Languages_currently_written_with_the_Arabic_alphabet to Arabic, while Hebrew script munges Yiddish and Hebrew together, which is maybe acceptable loss and probably happens on Windows already.
There are some hints in bug 108151 about some available fields in the gtk integration with the IBUS IM that might be of some use to pick an acceptable value to set for the language.
(In reply to Caolán McNamara from comment #3)
> I imagine using libexttextcat would just introduce a pile of "my language
> was guessed wrong" bugs. Especially for short sequences of text which won't
> be long enough for the statistical efforts of libexttextcat to guess it
Have you seen this library - https://github.com/CLD2Owners/cld2
> Unicode char range folds this bunch of languages
> Arabic_script#Languages_currently_written_with_the_Arabic_alphabet to
> Arabic, while Hebrew script munges Yiddish and Hebrew together, which is
> maybe acceptable loss and probably happens on Windows already.
For arabic alphabet languages, LO only lists persian, uyghur, punjabi and urdu under CTL and there are unicode characters that are unique to most of these languages.
@Lior: what is your take on Hebrew detection?
> There are some hints in bug 108151 about some available fields in the gtk
> integration with the IBUS IM that might be of some use to pick an acceptable
> value to set for the language.
Guessing based on locale is definitely helpful to some degree if a user lives in a country that a particular language is highly used in.
Created attachment 145279 [details]
Document with text in English, Hebre and Arabic for reproducing this issue
Reproduction instructions for this issue using the attached document:
1. Set your LO CTL language to Hebrew
1. Open the document (tri-lingual.odt)
2. Walk the cursor along the single line of text. You should see the status bar indicate the language as English (or "(en)"), then Hebrew, then Arabic (or "Arabic (Saudi Arabia)".
3. Copy the full line of text
4. Close the document
5. Open a new Writer document
6. Paste-Special the text you've copied, as unformatted text
7. Walk the cursor through the line again
Expected result: Language will again change from English, to Hebrew, to Arabic.
Actual result: Language will change from English to Hebrew, and be reported as Hebrew for the Arabic text as well.
Oh, I should mention I tested with:
Build ID: ad6adb1bfadf49af3187a0bb3ceffbf355e9eed1
CPU threads: 4; OS: Linux 4.9; UI render: default; VCL: gtk2;
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2018-09-29_02:45:20
Locale: en-US (en_IL); Calc: threaded
Still the same behavior with:
Version: 22.214.171.124.alpha0+ / LibreOffice Community
Build ID: 5c68399e6bea3aa18477487400f8bb143d6ed84e
CPU threads: 4; OS: Linux 5.18; UI render: default; VCL: gtk3
Locale: en-IL (en_IL); UI: en-US
See also: bug 139185 (and its See Also list) about language guessing problems of libexttextcat; see bug 139185 comment 4.
*** Bug 154495 has been marked as a duplicate of this bug. ***
Accessing the current keyboard layout appears to be distro-specific, so we'd need to (a) get the current distro and then (b) implement a bunch of hacks to get the current keyboard
FWIW localectl seems to be the most widely available command to extract this info.
Basically, this is a dupe of bug 108151, which also has some discussion about API availability, and an implementation of this for Qt5 from Jan-Marek.
*** This bug has been marked as a duplicate of bug 108151 ***
(In reply to Noel Grandin from comment #10)
> Accessing the current keyboard layout appears to be distro-specific
Isn't there some X-related mechanism/protocol for doing this?
> FWIW localectl seems to be the most widely available command to extract this
That's a systemd-based abomination, you definitely don't want LibreOffice to depend on systemd.
(In reply to Eyal Rozenberg from comment #13)
> (In reply to Noel Grandin from comment #10)
> > Accessing the current keyboard layout appears to be distro-specific
> Isn't there some X-related mechanism/protocol for doing this?
Looks like it is also used on Wayland:
"while Wayland doesn't have any official way to handle keyboards and keymaps, XKB is what they suggest to use, and in particular, all of the Wayland implementations I've seen tend to use xkbcommon, a quite modern implementation which is reasonably compatible: it uses the same keyboard data distribution that comes with X11."
A verbose layout query would be:
setxkbmap -query -verbose 10
Please notice that if you agree that this is a dupe, discussions that provide valuable info should be held in the main bug, to keep all the relevant bits together :)