Bug 95274 - Wrong editing languages offered
Summary: Wrong editing languages offered
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.0.0.2 rc
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: difficultyInteresting, easyHack, skillCpp, topicUI
: 162875 (view as bug list)
Depends on:
Blocks: Language-Detection
  Show dependency treegraph
 
Reported: 2015-10-23 10:29 UTC by Michael Bauer
Modified: 2024-09-08 21:55 UTC (History)
13 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot showing the language dropdown (58.58 KB, image/jpeg)
2015-10-25 21:04 UTC, Buovjaga
Details
screenshot showing unexpected language offering (12.25 KB, image/jpeg)
2015-10-25 21:16 UTC, Michael Bauer
Details
Spellcheck example (29.51 KB, image/png)
2015-10-26 13:01 UTC, Heiko Tietze
Details
Screenshot of LO still offerent anything but what I expect (44.56 KB, image/jpeg)
2020-02-25 21:19 UTC, Michael Bauer
Details
MS Word languge window (58.43 KB, image/png)
2023-06-23 19:50 UTC, ⁨خالد حسني⁩
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Bauer 2015-10-23 10:29:39 UTC
Using
Tionndadh: 5.0.2.2
Build ID: 37b43f919e4de5eeaca9b9755ed688758a8251fe
Sgeama ionadail: gd-GB (gd_GB)

with the gd UI on Win8. I have installed the gd spellechecker and the en-GB spellchecker (I also selected en-GB and gd-GB as available UI languages during install) but when I attempt to change the language of a document (entire, paragraf - any part of it) it offers me gd (selected by default which is correct) and Welsh for some arcane reason.

I have noticed (but not reported) erratic behaviour in this section before, I do not know where LO pulls the languages it offers from, I certainly have nothing either in Windows or LO which has anything with Welsh in it. So not sure why I'm offered Welsh and Gaelic but not en-GB.

To my mind, this feature is not user friendly overall - it seems to assume that there is a dictionary for every language a user might be working in. What if there isn't, say if I'm editing in ktz-NA? I can switch off the spell/grammar checking but it will still tell me it's a document in English or Russian or whatever my default language is. There ought to be a "Other" or "Set ISO manually" selector by default in that menu.
Comment 1 Buovjaga 2015-10-25 16:58:52 UTC
But "Tools - Language - For x" offers "More..". Isn't that what you meant by "Other"?
Comment 2 Michael Bauer 2015-10-25 17:38:53 UTC
No. I know that function (though I find it rather useless as it just points at the extensions site rather than help user pull directly from some repo).

The point is that since I have already installed the English (UK) and Gaelic UI and spellcheckers, I should consequently be offered (at least) English (UK) and Gaelic by LO for setting the language of a document and proofing. But I'm being offered Gaelic and Welsh, which is just not right. I have not installed anything Welsh. And why is it hiding/not offering English (UK) even though it's installed?

In MS Office there is similarly erratic behaviour when it comes to setting the document language. Even if I turn off this feature, it will consistently set the document language to languages which I did not set as default. MS Office for some arcane reason seems to communicate with the keyboard setting and over-ride my "Do not detect document language automatically" setting. I discovered this because I bastardized the de-LUX keyboard to build myself a keyboard for typing phonetic symbols (so I can do my normal typing in en-GB or de-DE but then just hit a shortcut to de-LUX to get system wide access to phonetic symbols without having to pick them off a character map). With the annoying side effect that MS Word in particular keeps resetting any document to de-LUX. Even if I've manually selected a paragraph and set it to en-GB or gd-GB.

So I think LO is doing some "unauthorised" pinging around for locales and overrides user settings.
Comment 3 Buovjaga 2015-10-25 18:22:08 UTC
Aha.. for me it doesn't open extensions site, but either character or paragraph formatting dialogs or the language options dialog.
Comment 4 Michael Bauer 2015-10-25 20:28:34 UTC
Well, yes and not. If I click onto the language display area and select More, it takes me to the dialogue where I can change the font associated with a language (which won't fix my problem), font effects, margins and so on. Not relevant to the problem.

Language > Set language for paragraph > More also takes me to the same menu.

And I did muddle something up in my brain. The I mixed it up with the Tools > Language > Get more dictionaries online menu, which takes you to the extentions site which isn't great.

Either way, my problem remains. Though it's in a different mood today. Welsh has disappeared and German-Luxemburg has appeared. If I start typing in gd-GB (my default document language), en-US is suddenly offered as a language I can choose. But there seems to be no way I can force it to en-GB even though that's the only UI language and proofing dictionary I have installed. I'm beginning to consider tearing my hair ;)
Comment 5 Buovjaga 2015-10-25 20:39:36 UTC
(In reply to Michael Bauer from comment #4)
> Well, yes and not. If I click onto the language display area and select
> More, it takes me to the dialogue where I can change the font associated
> with a language (which won't fix my problem), font effects, margins and so
> on. Not relevant to the problem.

Hmm.. for me, the "Language" dropdown changes the actual language of the text and you can see the change in the status bar.
Comment 6 Michael Bauer 2015-10-25 20:51:47 UTC
Yes I'm sure it does but I can only change it to a language LO is SHOWING me in that menu and right now, that's (depending on the mood of LO), en-US, gd-GB, de-LUX or cy-GB. But *NOT* en-GB the language I actually want and which I have installed. I cannot select a language it's not showing me. That's precisely my problem :)
Comment 7 Buovjaga 2015-10-25 21:04:39 UTC
Created attachment 119946 [details]
Screenshot showing the language dropdown

Here is what I see. You only have a couple in that list?
Comment 8 Michael Bauer 2015-10-25 21:15:01 UTC
But I don't need to change the association of specific fonts with specific languages. All of the languages in question use the Latin alphabet so I should be able to change between the languages (in my case en-GB and gd-GB) without having to change default font associations. It's not like I need to tell it to use Arial Unicode so I can write in Burmese or something.

And it still does not explain why it offers me German (Lichtenstein) and Gaelic but not English (UK). And/Or en-US which I have neither as a keyboard not a proofing language.
Comment 9 Michael Bauer 2015-10-25 21:16:01 UTC
Created attachment 119947 [details]
screenshot showing unexpected language offering
Comment 10 Buovjaga 2015-10-26 06:04:04 UTC
(In reply to Michael Bauer from comment #8)
> But I don't need to change the association of specific fonts with specific
> languages. All of the languages in question use the Latin alphabet so I
> should be able to change between the languages (in my case en-GB and gd-GB)
> without having to change default font associations. It's not like I need to
> tell it to use Arial Unicode so I can write in Burmese or something.
> 
> And it still does not explain why it offers me German (Lichtenstein) and
> Gaelic but not English (UK). And/Or en-US which I have neither as a keyboard
> not a proofing language.

But my screenshot is not about changing the association of specific fonts with specific languages. It does the same as the selection menu in your screenshot.
Comment 11 Michael Bauer 2015-10-26 10:30:54 UTC
Errr right. I mean, thank you for pointing that out but I think I will continue keeping this bug open because that must be the most un-intuitive menu in LO. I have done multi-lingual wp for over two decades and this is really not practical.

The menu I screenshotted should show the languages I installed for UI/proofing, always. But perhaps most importantly, that Fonts tab needs a complete overhaul because if you label something Fonts and put a font dropdown next to font size and a languages dropdown, that suggest this is some sort of font-to-size-to-language mapping for defaults. There is literally nothing on that tab that suggests to a user who isn't a LO developer that this is the place they can manually set the language for a document, section or paragraph.

And it needs a tickbox for "Do not automatically detect my document language". Because it clearly gets it wrong badly, otherwise why suggest en-US over my installed preference of en-GB?

That probably means it's more of a UI bug though than linguistic, that one I'm not sure of.
Comment 12 Buovjaga 2015-10-26 10:37:54 UTC
I agree. Let's ask the design team, if they would like to tackle making all this more intuitive.
Comment 13 Heiko Tietze 2015-10-26 13:01:33 UTC
Created attachment 119967 [details]
Spellcheck example

It works for me as expected. Right-click an unknown word offers me the option to set the language for document/paragraph (is properly related to the style)/selection (direct formatting), with German as my default and English as the installed option. When I go to Tools > Options > Language Settings > Languages and change the default to French there is eventually a third option in this spellcheck menu (although French is not installed what is indicated by the blue checkmark icon). Guess installing another dictionary does the same. Perhaps https://forum.openoffice.org/en/forum/viewtopic.php?f=74&t=16512 also helps you.

What I agree with is the weird place for language at the font section. And I dislike the context menu, which offers me to change spellchecking only for unknown words. That makes it necessary to use Tool > Language for selections or when a dictionary is not installed (the French example).

PS: Nice to see you here as well, Michael. :-)
Comment 14 Michael Bauer 2015-10-26 13:25:53 UTC
Hi Heiko :) which reminds me I need to bring the other translation up to speed.

Right-click context menu on a word is also not behaving as it should for me. If I do that, (I typed bluefin) it offers me Gaelic (my default) and Icelandic. No idea where THAT is coming from but the more we look at the, the more it looks like there is some autodetect feature which gets itself terrible confused and starts overriding the sensible defaults (installed UI languages of LO, installed proofing extensions).
Comment 15 frostwyrm333 2016-02-21 14:26:18 UTC
Still happens in 5.1 version.
Extension languages are not offered by default.
With certain highlighted words, it will offer you a correct language with a wrong name like Catalan or Gaelic, but often you are unable to change the language of the text at all.
Comment 16 Adolfo Jayme Barrientos 2016-02-22 10:25:58 UTC
(In reply to Michael Bauer from comment #14)
> Right-click context menu on a word is also not behaving as it should for me.
> If I do that, (I typed bluefin) it offers me Gaelic (my default) and
> Icelandic. No idea where THAT is coming from […]

If it is from any help, it comes from libexttextcat. :-)
Comment 17 Robinson Tryon (qubit) 2016-08-25 05:49:48 UTC Comment hidden (obsolete)
Comment 18 Heiko Tietze 2020-02-25 10:00:26 UTC
(In reply to frostwyrm333 from comment #15)
> Still happens in 5.1 version.
> Extension languages are not offered by default.

Anything that UX can do here? Removing for now.
Comment 19 Michael Bauer 2020-02-25 21:19:04 UTC
Just to confirm this is still a problem in 6.3.4.2 (x64) (attaching screenshot), today I've been offered en-US, Rumantsch and Irish (the last two I have neither as a proofing language in LO or Office or as a keyboard layout, goodness knows where LO is pulling them from) - but still not en-GB.

This is one of the reasons why I'm still not using LO as my main office suite, I just can't work with software that takes such a haphazard approach to setting the language of text and prefers to second guess the user on a word by word basis (yes, type another word, get offered a new language).
Comment 20 Michael Bauer 2020-02-25 21:19:41 UTC
Created attachment 158190 [details]
Screenshot of LO still offerent anything but what I expect
Comment 21 Michael Bauer 2020-09-25 20:45:53 UTC
I'm now being offered Gaelic, Faroese and German (only Gaelic and German are language I actually have proofing installed for) and again, I have no idea where Faroese is coming from. The language I actually wanted to select for the whole document, en-GB, isn't being offered.

Right click brings up the Paragraph menu but there no longer seems to be an option to set the language even via that convoluted route.

If I right click on a mis-spelled word, I can set the language for the paragraph - though it now offers me Gaelic, German and Interlingua.

I also don't get why I can't set the language 
- when spellchecking is turned off. A user may want to set the language but not use spellchecking. Or I may want to set a language for which there is no proofing extension but I don't want to document in to be set to the wrong language.
- for the entire document, it's either shoddy UI language if "Paragraph" means the whole document or tedious to have to do this paragraph by paragraph.

There's probably some well-meant automatic thing going on but it's badly broken for anyone not using LO in en-US.

Given the high frequency with which this issue must occur for many users, I'm dismayed at the lack of attention this is getting. It's been 7 months...
Comment 22 Michael Bauer 2021-04-18 17:24:30 UTC
Still the same, paste a piece of English text, I'm being offered German, Spanish (not even installed) and Gaelic.
Comment 23 Heiko Tietze 2021-04-19 06:32:38 UTC
Eike, can you please shed some light on this issue? 

UI/UX-wise I think bug 103036 is the way to go.
Comment 24 Michael Bauer 2021-12-03 11:55:16 UTC
Six years on, the only way I can reliably change the document or section language in LO is by opening the document in MS Office, changing the document language and then going back to LO.

Talk about making a KEY usability bug a priority ...
Comment 25 Eike Rathke 2021-12-06 15:28:02 UTC
(coming across this due to a new notification).

(In reply to Heiko Tietze from comment #23)
> Eike, can you please shed some light on this issue? 
Not really. I don't know the implementation of that status bar dialog nor what languages it finds worth to be picked up.

I can only say that selecting text and clicking More... in the dialog and assigning a language from the Font tab *does* work and changes the language of the selection. Similar for Paragraph. It also works to change "Default Languages for Documents" in Tools -> Options -> Language Settings -> Languages, including "For the current document only". I don't know why MS-Office would be needed for that..
Comment 26 Michael Bauer 2021-12-06 15:54:50 UTC
Can someone please explain to me how associating a font with a language is supposed to work please unless you're a monolingual who only ever types in the one language? So let's say I associate Times New Roman with Scottish Gaelic and write "Tha cat air a' phlaide". Dandy. Now in the next paragraph I type "The cat on the blanket is an old Gaelic song" followd by a long essay on this topic in English in Times New Roman. Pray, how am I supposed to get that spell-checked in *English* when TNR is now tied to Gaelic and the only languages that menu at the bottom is showing are random languages you can get LO in but not languages you've got spell checkers installed in?

Even if we stick to monolingual documents, that doesn't work, because someone might write an essay in German but then in another document, using the same font, a job application in Spanish. Then what?

At the moment, the only way - without having to resort to this incredibly clumsy font association thing - I can change the proofing language of a word/paragraph/selection is by opening it in MS Office, changing the locales as required and the going back to LO. Bizarrely, once I've done that, LO is ok with whatever selection I've made. It just refuses to allow me to set them within LO.

The locale - in terms of the proofing language - for a word/paragraph/selection should be modifiable for the end user without having to resort to click contortions and in an intelligent way i.e. the top proofing locales suggested should be those for which dictionaries are installed. Not some random selection.
Comment 27 Buovjaga 2021-12-06 16:58:26 UTC
(In reply to Michael Bauer from comment #26)
> Can someone please explain to me how associating a font with a language is
> supposed to work please unless you're a monolingual who only ever types in
> the one language? So let's say I associate Times New Roman with Scottish
> Gaelic and write "Tha cat air a' phlaide". Dandy. Now in the next paragraph
> I type "The cat on the blanket is an old Gaelic song" followd by a long
> essay on this topic in English in Times New Roman. Pray, how am I supposed
> to get that spell-checked in *English* when TNR is now tied to Gaelic and
> the only languages that menu at the bottom is showing are random languages
> you can get LO in but not languages you've got spell checkers installed in?

Not tied to the font, but to the direct character formatting or character style or paragraph style.
Comment 28 Michael Bauer 2021-12-06 17:04:40 UTC
That is still bonkers. The locale tag (or whatever it is) of text shouldn't hinge on fonts, formatting or style. When was that ever a good idea? ;)
Comment 29 Heiko Tietze 2021-12-07 07:16:08 UTC
Some background on the language tag are here [1]. The point is that complex layouts like Korean require special glyphs and RTL fonts have a different reading direction, for example. 

That's not the way to distinguish between Gaelic and English, of course, where you have to define explicitly what language to use. You can do this for the whole paragraph or just characters. Create dedicated styles for the languages you need.

No idea what causes the "erratic behavior" on auto detection. The reported issue is also not easy to reproduce. And more generally, I see no way to reliably detect the correct language.

[1] https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags
Comment 30 Michael Bauer 2021-12-07 10:10:12 UTC
(In reply to Heiko Tietze from comment #29)
> Some background on the language tag are here [1]. The point is that complex
> layouts like Korean require special glyphs and RTL fonts have a different
> reading direction, for example.

<shrugs> I write a lot of Cantonese, so I'm aware of the complexities of getting the right font locale variants under Unicode. And yes, Microsoft isn't great at obeying the default locale font setting.
BUT the point is that while having a working default locale font setting is great, that is NOT the right place to also govern the locale setting for a particular section of text.

> That's not the way to distinguish between Gaelic and English, of course,
> where you have to define explicitly what language to use. You can do this
> for the whole paragraph or just characters. Create dedicated styles for the
> languages you need.

The point is, I CANNOT do this for the whole paragraph or just characters. Not without going through huge contortions. Something that is SUCH a common event for users working in more than one language, this should require two clicks at the most. Not having to work your way through the default locale font settings or having to create dedicated styles. That's just NOT user friendly for such a basic common issue. 


> No idea what causes the "erratic behavior" on auto detection. The reported
> issue is also not easy to reproduce. And more generally, I see no way to
> reliably detect the correct language.
> 
> [1] https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags

I'm not buying that. I'm no programmer as you well know but while I understand auto-detection can get things wrong (Microsoft does, often enough, even when this feature is turned off) but I cannot imagine it's impossible to weigh detection towards either a) installed dictionary and/or LO UI languages or b) user choice via settings. Not suggesting Tibetan when your LO UI language is French and you have French, Spanish and English dictionaries installed but not Tibetan cannot be impossible to do.
Comment 31 Heiko Tietze 2021-12-07 11:09:41 UTC
If the request is to make auto detection work reliably we have to wait for experts/volunteers.

If you think that manually picking the language has room for improvements we can discuss this. I don't understand why you cannot change the paragraph/character style - okay, writing a dictionary where you have to switch from one language to the other every second word... And if you have a valid use case, is there a solution that wont bother average users?
Comment 32 Michael Bauer 2021-12-07 11:44:19 UTC
(In reply to Heiko Tietze from comment #31)
> If the request is to make auto detection work reliably we have to wait for
> experts/volunteers.

Perhaps we could invite some? 
 
> If you think that manually picking the language has room for improvements we
> can discuss this. I don't understand why you cannot change the
> paragraph/character style - okay, writing a dictionary where you have to
> switch from one language to the other every second word... And if you have a
> valid use case, is there a solution that wont bother average users?

I have no research data on this but my experience with being sent other peoples documents is that 9 out of 10 people don't use styles because they are either unfamiliar with the concept or find them too cumbersome for short documents where you don't want to spend time defining a style but may wish to change the font or indeed write in a different language. Styles are really useful for something large, like writing a book or dictionary (and I do use them for those) but if I'm writing a letter and just want to use a different para spacing or font, there's no point in faffing about with styles, you just change it ad-hoc as you'll most likely never use that configuration again.

So styles as a default path to setting the proofing locale are a) not realistic when considering the default behaviour of everyday users and b) not quick enough.

MS Office has a *really* simple and intuitive way which we could emulate:
1 open a doc and type somethimg, Word will try and guess the language, same as LO
2 if it guesses wrong, in Word you just single left click at the bottom of the window where the locale is indicated (with or without selecting text). That brings up a list of locales (the top 4 are those with dictionaries installed, the rest is an alphabetical list of ALL locales Word supports from Afrikaans to Yoruba). Click on the one you want, click ok, done.

2 is where LO fails. I think it's trying to do the wrong thing, because if you click on the locale, it does try to bring up a list of locales but the way in which it chooses the 2-3 locales is broken. However it chooses those is at least partly broken.

For an easy fix (to my mind), the locale suggestions should be limited to installed proofing dictionaries and (if different) the LO UI locale (as some locales have a localized UI but no proofing).

A slightly more complex fix would be to then fix the "More" option that appears in that window to bring up NOT the font settings but a list of locales/languages like Word does where you can simply do an ad-hoc selection of the locale/language for a document or selection without having to resort to styles or font associations.

Here's an easy challenge: open a blank document in Word and one in LO. Type a random selection of characters. Now try to set the language of that selection to Yoruba and time yourself. I bet it takes you a LOT longer to do that in LO compared to word. (with a document open in word with a word pre-typed, it takes me 7:86 seconds to set that word to Yoruba. In LO, after 15 seconds, I'm still wondering if I should click More or Set Paragraph Language > More and even longer wondering why both of those options bring up the Font menu)
Comment 33 Michael Bauer 2021-12-07 11:45:47 UTC
Recte:
* I think it's trying to do the RIGHT thing
Comment 34 Eike Rathke 2021-12-07 18:31:24 UTC
(In reply to Heiko Tietze from comment #29)
> Some background on the language tag are here [1].
> [1] https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags
Those 3-4 letter "Language System Tags" are only a thing with OpenType fonts and completely unrelated to BCP 47 / RFC 56546 language tags, except that they sometimes share a letter or two.
Best erase it from memory in this context.

If you want to learn about language tags as we use them:
https://erack.de/bookmarks/D.html#Language_Tags

However, that's not really related to the problem here, except that we use language tags for language and locale attribution.
Comment 35 Eike Rathke 2021-12-07 19:24:33 UTC
Much of the confusion why the Language list even sits beside the font options probably results from the fact that most people do not have CJK and CTL enabled, otherwise they would get two or three font sections with each a language correlated. Of course language is not a font attribute, despite that the dialog may look like it would suggest that. The reason it is like it is again is ..haha.. MS-Word.. also there is the Western/CJK/CTL distinction. Now what it actually is good for is that if one types text of CJK or CTL mixed with "Western" (or rather non-CJK and non-CTL), then the text's language determines which font to use.

Additionally the font preview may change according to the language selected if it uses a different script. However, with the "Western" languages this currently happens only for Greek (and Irish;) and with CJK for Chinese vs Japanese. That would probably need more sensible short preview texts, Lorem Ipsum in other scripts doesn't make sense.. I didn't dig deeper.

If there wasn't this correlation between language/script and font to be used then the Language list(s) could be independent, but would lose the font preview. If one uses only "Western" then just view it as both, the three font attributes to be set for the selection, and the language to be set for the selection, without dependency.
Comment 36 Michael Bauer 2021-12-07 22:30:56 UTC
So what does that mean in terms of fixing the impractical approach to changing the language of a specific bit of text or document in LO?
Comment 37 Buovjaga 2021-12-08 08:05:40 UTC
(In reply to Michael Bauer from comment #36)
> So what does that mean in terms of fixing the impractical approach to
> changing the language of a specific bit of text or document in LO?

Why is it impractical to select characters and double-click a character style from the sidebar or be in a pragraph and double-click a paragraph style or pick it from the formatting toolbar?
Comment 38 Michael Bauer 2021-12-08 10:18:51 UTC
(In reply to Buovjaga from comment #37)

> Why is it impractical to select characters and double-click a character
> style from the sidebar or be in a pragraph and double-click a paragraph
> style or pick it from the formatting toolbar?

To repeat myself:
a) few people use styles, especially for shorter documents and even for longer ones, they are rarely in evidence. I proofread a LOT of texts for submission to medical journals, maybe 5% use styles, the rest have ad-hoc formatting applied, even to stuff like headings. They take both time and expertise to use whereas the text size menu and/or bold/italic direct formatting are visibly there, easy and quick to use. No wonder most people just hit pt 14 and bold underline instead of a heading style. It may not be the way word processing software is intended to be used, but it's the reality of how people use it. We can claim the moral high ground and ignore that, or we can try and make it work for as many users as possibly (as far as the proofing language issue goes, I'm NOT suggesting we dump styles). 

b) it's counter-intuitive. Especially with the document/text language displayed at the bottom of the window, nobody who isn't a LO dev will go looking in the *styles* for a way to change the language their text is proofed in. It's like telling your mom who's baking you a cake that the flour is in the garage freezer. It may be the place where you keep your flour and in some ways perhaps not a bad place for it but it's a) a long way from the kitchen and b) not the place anyone goes looking for flour, not when you have a kitchen larder...
Comment 39 Heiko Tietze 2021-12-08 12:03:53 UTC
Setting the language via statusbar has been criticized many times, see bug 143366, bug 116158 - bug 107288 (all NAB/WF) and bug 114178. 

Whatever we do, it will be wrong in some scenarios. Either it shows the document's language and fails on the paragraph level or vice versa. The average user might want to set a language per document but even a bit advanced users apply it per paragraph. And the actual information is on the current character. So the only clear solution here is to prohibit changing the language at the statusbar.
Comment 40 Michael Bauer 2021-12-08 12:15:50 UTC
So you're seriously suggesting that offering me Tibetan but not the proofing languages I have installed (English/Gaelic/German) is intended and desirably behaviour? No wonder it's so hard to take FOSS seriously at times...
Comment 41 Michael Bauer 2021-12-08 12:19:27 UTC
And those bugs you linked don't criticize the ability to set the language from the status bar, they criticize the INability to do so i.e. the fact that instead of being able to set it from the bar, it brings up the font menu.

We're clearly going round in circles, with normal users pointing out this is broken and the devs claiming it isn't and that it's intended behaviour. Is there some sort of an escalation procedure within LO where we can take an issue to a wider audience to get input from people who aren't 10 devs who understand how this is theoretically supposed to work?
Comment 42 Eike Rathke 2021-12-08 12:33:23 UTC
This bug has gone a long way from "the status bar menu displays odd languages to select from" to criticizing all things around language and font attribution. Can we please get back to the original problem? No one is going to read 42 comments anyway. Thank you.
Comment 43 Michael Bauer 2021-12-08 12:46:21 UTC
Gladly but it seems everyone is pointing the finger somewhere else. All I (and I would imagine any sane user) want is for the menu at the bottom not to show random languages but those relating to the user's installed proofing languages and UI locale.
Comment 44 Mike Kaganski 2021-12-08 12:50:06 UTC
See bug 139185 comment 4 for a code pointer (and explanation).
Comment 45 ⁨خالد حسني⁩ 2023-06-23 19:37:24 UTC
I tried to debug this language status menu, and here is my findings so far:

The code that populates the language menu is LanguageSelectionMenuController::fillPopupMenu() in:
https://git.libreoffice.org/core/+/refs/heads/master/framework/source/uielement/langselectionmenucontroller.cxx#160

Which in turn calls FillLangItems() which is responsible for selecting what language items to show:
https://git.libreoffice.org/core/+/refs/heads/master/framework/source/fwi/helper/mischelper.cxx#57

The code seems to add:
1. current language of the text selection
2. system language
3. ui language
4. guess language based on the text selection (seems to use libexttextcat)
5. keyboard language (I think this works in Windows only)
6. all languages used in the document

and in each of these (expect of 1) the language is only added if it makes sense for the script.

These all seems reasonable to me. Now the question is what other criteria we want to use for adding languages?
Comment 46 ⁨خالد حسني⁩ 2023-06-23 19:50:07 UTC
Created attachment 188072 [details]
MS Word languge window

What MS Word is doing does not seem to be so sophisticated either, on a mix Chinese/English document it shows a dialog with all languages, and it selects 6 languages 3 of them are useless as they for different scripts than the text and won’t be applied when selected (Arabic, Chinese, and Persian). It seems to base the selection on the locales of the system or some other thing I’m not sure about (for example it shows Swiss even though it does not seem to have spell checking for it).

May be one improvement we can borrow from MS Word it to use a window instead of a menu, so it can always show all languages instead of only the selected ones, so regardless of how we guess the most sensible ones the user can always find a suitable languages without having to click more.
Comment 47 ⁨خالد حسني⁩ 2023-06-23 19:54:45 UTC
I now see that Mike already provided the code pointers in Comment 44. Probably this need some UX input as well.
Comment 48 Heiko Tietze 2023-06-26 10:33:47 UTC
UX was involved earlier, with not much to add.

The issue why Welsh is offered for Gaelic. It should be clear why languages are offered, ideally user-definable per checkbox with some sane preset.
Comment 49 ⁨خالد حسني⁩ 2023-06-26 10:57:04 UTC
(In reply to Heiko Tietze from comment #48)
> UX was involved earlier, with not much to add.
> 
> The issue why Welsh is offered for Gaelic. It should be clear why languages
> are offered, ideally user-definable per checkbox with some sane preset.

I’m inquiring specifically about:

> May be one improvement we can borrow from MS Word it to use a window instead
> of a menu, so it can always show all languages instead of only the selected
> ones, so regardless of how we guess the most sensible ones the user can always
> find a suitable languages without having to click more.

But if the current behavior is deemed acceptable/working as designed, then there is no point in keeping this issue open.
Comment 50 Michael Bauer 2023-06-26 11:12:07 UTC
The current behaviour is not acceptable. End users require a simple way of being able to control the languages offered for proofing. At the moment, it's a) a lottery which languages are offered b) there is no way of fixing this other than theoretically through some really complicated thing via the font settings.

Whatever LO's best guess mechanism is for producing the current offering, fine, but when the user clicks on the bottom bar which brings up the current best guesses and they're wrong, there should be a new menu than users can access, let's call it Select Proofing Languages where the user can select (a simple tick box list of the locales LO can offer) and restrict the proofing languages they want to be offere i.e. if I go into that menu and select Gaelic, English and German, then LO should stop offering me Tibetan and Cherokee just because it things I'm writing in Cherokee for some odd reason.
Comment 51 Mike Kaganski 2023-06-26 11:16:38 UTC
(In reply to ⁨خالد حسني⁩ from comment #49)

The problem here is awful results from libexttextcat. Especially on small text length. Despite "working as designed", the resulting menu is unacceptably wrong. And possibly because of those wrong results, some results that could make sense from your list in comment #45 do not get into the resulting menu (just a guess, didn't check this code) - because e.g. all languages from the document are not always shown, even if match the script, and would be much better than what libexttextcat suggests.
Comment 52 ⁨خالد حسني⁩ 2023-06-26 12:57:46 UTC
(In reply to Mike Kaganski from comment #51)
> (In reply to ⁨خالد حسني⁩ from comment #49)
> 
> The problem here is awful results from libexttextcat. Especially on small
> text length. Despite "working as designed", the resulting menu is
> unacceptably wrong. And possibly because of those wrong results, some
> results that could make sense from your list in comment #45 do not get into
> the resulting menu (just a guess, didn't check this code) - because e.g. all
> languages from the document are not always shown, even if match the script,
> and would be much better than what libexttextcat suggests.

If libexttextcat guessing is not usually useful, we can drop that part. But showing languages that don’t match the selected script is useless, since they won’t be applied if user clicks on them (you can’t set CTL text English or Western text Arabic, LO just won’t let you do that).
Comment 53 Stéphane Guillou (stragu) 2024-05-28 06:22:26 UTC
If we are to keep this feature, my take would be to do two things:

1. only show matches that have a dictionary installed (with obvious drawbacks, but likely more useful matches as a result)
2. improve the UI to let the user know the suggestion(s) is (are) "guessed from text" or equivalent. If there's a way to do that in the menu, great (by e.g. prepending a short string). Otherwise, move the feature to a dialog.
Comment 54 Stéphane Guillou (stragu) 2024-05-28 06:30:24 UTC
*** Bug 161033 has been marked as a duplicate of this bug. ***
Comment 55 daniel.schaaaf 2024-05-28 13:17:09 UTC
I second Stéphane's suggestions.

The current behaviour is weird at best. The word "line" suggests Friulian as the language, "car" results in Romanian, "test" in Estonian, ...

Does libexttextcat score/rate its guesses? I'd expect things to go wrong with single words. So, either discard language guesses with a low score, or guesses for paragraphs with less than e.g. five words.




Although, even with complete sentences we are not guaranteed to get correct suggestions. The following text is recognised as "Norwegian, Nynorsk", although it is written in Bokmål: "Dette er ikke nynorsk. Du gjettet feil."

Translated into Nynorsk, it would be: "Dette er ikkje nynorsk. Du gjetta feil."

Notice how "ikke" and "gjettet" are exclusively Bokmål, while "ikkje" and "gjetta" are Nynorsk.




It would be nice if LO would indicate that a language is a suggestion (from libexttextcat). The language lists (in "Tools > Language > For Selection/Paragraph/All Text") could be expanded to something like this:

=================================
Language(s) in current document:
English (UK)
☑ French (France)
More ...
---------------------------------
Suggested for current paragraph:
Norwegian (Bokmål)
---------------------------------
None (Do not check spelling)
---------------------------------
Reset to Default Language
=================================
Comment 56 Michael Bauer 2024-05-28 13:38:14 UTC
I would tweak Stéphanes first suggestion to

1. only show matches that have a dictionary installed (with obvious drawbacks, but likely more useful matches as a result)
[AND matches of the LO UI language AND the user OS locale; these should be the most intuitive options]
So a user who is on en-US Windows, running LO in Tibetan and with spellchecking installed for en-US and Japanese would be offered Tibetan (LO locale), en-US (OS locale) and Japanese (installed dictionary).

But not Welsh or Klingon ;)

Personally I'd mothball libexttextcat but let's save that debate for another 10 year bug...
Comment 57 Heiko Tietze 2024-05-29 08:26:12 UTC
Jonathan, do you have an opinion on this?
Comment 58 Jonathan Clark 2024-05-29 12:08:19 UTC
(In reply to Heiko Tietze from comment #57)
> Jonathan, do you have an opinion on this?

My vote is for the following, in this order:

1.) Languages that already exist in the current document. If a language has already been used in a document, I think it's likely that it will be used again. Those languages should be positioned prominently.

2.) Languages that the user has *explicitly* specified as languages they understand and intend to use with LibreOffice. The user is the best judge of this, so we should give them the opportunity to tell us. I don't think it will ever be possible to guess this right every time for every user.

3.) Languages derived heuristically from LO configuration: default languages for documents, user interface language.

I think installed spellcheck dictionaries is a weak signal. My LO Snap install was bundled with dictionaries for many languages I don't know, and I'd rather not see them in this list.

4.) Languages derived heuristically from system configuration. This is a reasonable starting point, but is only a rough guess. There are many reasons why a user's system configuration might not reflect all of their languages.

For an extreme example, consider Linux: English-primary users can't set a second language on Linux. If you try, it will break localization in most gettext programs. An English-locale Linux user with a US international keyboard might need to regularly work in dozens of languages, but we'd have no way of knowing which ones.


Regarding libexttextcat: Based on the above discussion, am I correct that this is being used on individual words? Given cognates and loanwords, I don't expect classifying individual words can ever be reliable. The docs say they expect "hundreds of bytes", which seems more reasonable.

Instead of using this to generate confusing recommendations, perhaps this could be used somehow for recommending the best match from the high-signal candidates?
Comment 59 Michael Bauer 2024-05-29 12:21:35 UTC
>1.) Languages that already exist in the current document.

That would only work if by that you mean languages that a user has set manually, because libexttextcat is - however it attempts to do it - broken but fixing libexttextcat needs a different bug.

>2.) Languages that the user has *explicitly* specified

I would welcome such an option but didn't think such a dialogue option would get much support.
Comment 60 Heiko Tietze 2024-05-30 08:50:36 UTC
(In reply to Michael Bauer from comment #59)
> >2.) Languages that the user has *explicitly* specified
> 
> I would welcome such an option but didn't think such a dialogue option would
> get much support.
This ultimately means to get rid of libexttextcat and to let the user decide manually. Could imagine to have checkboxes with the Default Languages dropdowns instead of the green check marks today. And if checked, you get this language in the selection. The sorting order is another topic.

If we want to keep the check mark thingy, guess it becomes visible if the language has a corresponding dictionary installed via extension, we may do so. But I'd just drop that confusing mix of configurations.
Comment 61 Michael Bauer 2024-05-30 09:08:58 UTC
I don't think libexttextcat is beyond repair but I have always been told off for trying to tackle more than one (however connected) issue in one bug, so I'm trying to stick to fixing just the bit that offers up the wrong languages.

But since you ask, I think libexttextcat could easily be made loads better by restricting it's options i.e. at the moment, it tries for ALL the languages it can potentially tag. And gets it spectacularly wrong. But if there was a way for the user to say "I normally only handle docs in languages A, B and F" then if libexttextcat was restricted to differentiating only those n languages, it should work much better. I'm sure it would struggle to an extent with closely related stuff like Bokmal and Nynorsk but would overall perform much better nonetheless.
Comment 62 Heiko Tietze 2024-05-30 09:23:33 UTC
Caolan, what do you think?
Comment 63 Caolán McNamara 2024-05-30 12:24:17 UTC
I think the original idea behind the libexttextcat usage was someone pastes a paragraph of text into something and it could be used to set the likely language for it. I don't think if ever was expected to work well with a few words. Its presence in the drop down of languages is probably a bit of a red herring and just gets it the blame.

Populating the list of suggested languages via some totally other mechanism of languages used in the document and/or/+ n last languages used ever. And offering "guess language" as a totally unrelated command, or submenu from the other list, might help.
Comment 64 daniel.schaaaf 2024-05-30 13:14:50 UTC
It appears that the language list "For Selection" works in a different way than the lists "For Paragraph" and "For All Text". (see https://bugs.documentfoundation.org/show_bug.cgi?id=161344)

The list "For Selection" can be populated with up to seven recently used languages, and the currently used language is indicated by a check mark.

The lists "For Paragraph" and "For All Text" seem to only show languages that are actually used in the current document (except for when bug 161344 is triggered, and with the addition of the language "detected" by libexttextcat). Both lists have spacing for a check mark, but they don't use that feature.

It looks like the idea of "recent languages" with a check mark for the language at the text cursor position was implemented only half way. I actually like the list of recent languages, and the check mark is very helpful. But I could live with the approach to just show "languages currently in use". Only the mixture of both approaches is confusing.



I'd like to add that the language lists "For Selection" and "For Paragraph" are absent in other LibrOffice applications like Calc and Impress. The language list "For All Text" doesn't work either in these applications. I do only see my LibreOffice default language in that list, even though the currently used language is correctly shown in the status bar at the bottom.
Comment 65 Michael Bauer 2024-05-30 13:37:53 UTC
What's the point of "recent languages" if you cannot actually set the language to something sensible? It's a double bind. So I could set a section or the document to Gaelic if LO thinks it's a language I recently used. But unfortunately LO thinks I work with Welsh, Tibetan and goodness knows what else. 

Whatever these clever features do or are supposed to do, can we PLEASE get something implemented that allows a user to EASILY set the language for a document or selection manually that OVERRIDES whatever LO *thinks* should be the case?

We can play with anything else at leisure afterwards, but this issue has been around for eons, the bug itself is 10 years old...
Comment 66 Heiko Tietze 2024-05-30 14:50:59 UTC Comment hidden (noise)
Comment 67 daniel.schaaaf 2024-05-30 21:37:11 UTC
The point of "recent languages" is that those would be languages the users chose themselves (for the current document). But, I get that showing only "languages currently in use" has its appeal too. Although ... since we are all likely to use screens with higher resolutions than 800x600 px ... why not have both, languages currently in use, and recently used languages?


=====================================
Language(s) used in current document/selection/paragraph:
English (UK)
☑ French (France)
More ...
-------------------------------------
Recently used languages:
Some Language
Some other language
Yet another language
-------------------------------------
Suggested for current paragraph:
Norwegian (Bokmål)
-------------------------------------
None (Do not check spelling)
-------------------------------------
Reset to Default Language
=====================================


I love that the language for the current selection is indicated by a check mark and in the status bar. This should be extended to the language lists "For Paragraph" and "For All Languages".

Regardless of what people prefer, recent languages with a check mark for the current language, or only languages currently in use, we need consistency.
Comment 68 Michael Bauer 2024-05-30 21:43:15 UTC
>=====================================
>Language(s) used in current document/selection/paragraph:
>English (UK)
>☑ French (France)
>More ...
>-------------------------------------
>Recently used languages:
>Some Language
>Some other language
>Yet another language
>-------------------------------------
>Suggested for current paragraph:
>Norwegian (Bokmål)
>-------------------------------------
>None (Do not check spelling)
>-------------------------------------
>Reset to Default Language
>=====================================

Can someone explain to me how in this scenario I force LO to 
a) use a language that I cannot currently set (because it's not offering it to me)
b) use a language I haven't recently used because LO never offers it
c) doesn't suggest because it thinks Gaelic is Tibetan?
Comment 69 daniel.schaaaf 2024-05-30 22:25:28 UTC
(In reply to Michael Bauer from comment #68)

> Can someone explain to me how in this scenario I force LO to 
> a) use a language that I cannot currently set (because it's not offering it
> to me)
> b) use a language I haven't recently used because LO never offers it
> c) doesn't suggest because it thinks Gaelic is Tibetan?

"More..." should get you what you crave.
The lists "For Paragraph" and "For All Languages" currently doesn't show recently used languages. The list "For Selection" does offer recently used languages ... sometimes (see bug 161344).

Although, "More..." should open a dedicated language selection dialogue. Right now, "For Selection/Paragraph" opens the "Character" dialogue, while "For All Text" opens the LO/Writer "Options" dialogue.
If "More..." would open a dedicated language selection list directly, we could skip/scrap "recently used languages" too. Currently, we have to click "More...", search for "Language", click on the language selection to expand the language list, and then scroll to the language of choice in a list filled with languages that aren't even available/installed ...
Comment 70 Heiko Tietze 2024-06-14 09:04:08 UTC
We discussed the topic in the design meeting.

Language guessing is needed by many users and should be improved. However, the used library was never meant (nor is it likely to achieve) to detect the language based on one or a few words. Alternatives might be available though, https://github.com/pemistahl/lingua-py.

The suggestion is to make the function optional by 
a) adding "[ ] Detect languages" (off by default) to Tools > Options > Languages and Locale > Enhanced Language Support, 
b) show document languages in the dropdown and require user to pick one from the character dialog (as today and like MSO does), and
c) sort the results according comment 55.
d) do not add recent languages (across sessions),

In the long run it would be good to improve the detection.

For a code pointer, see comment 44.
Comment 71 pixie 2024-07-23 01:06:33 UTC
> Could imagine to have checkboxes with the Default Languages dropdowns instead of the green check marks today.

* 141 languages utilize the Cyrillic writing system.
* 44 languages utilize the Arabic writing system. 
* 14 languages utilize the Hebrew writing system. 

For any of those writing systems, one is looking at way too many check-boxes/drop down choices.

There are 293 writing systems spread over 4,000 languages, or if you wish to include languages that are not yet written, 7,400. The same language can have different spelling, grammar, and syntax conventions in different countries.

A hypothetical worst case scenario of 1.5*10^9 combinations of language, country, and writing system.

Any option chosen needs to:
* Offer no more than seven options at each choice;
* Be easily expandable to that number of combinations;
* Not overwhelm the naive user;
* Not frustrate the sophisticated user;

Naive user: How do I write this document in British English?
Sophisticated user: Why do I have to jump through so many screens to switch between English written in Kata Kana, and Afrikaans written in the Arabic writing system?
Comment 72 Michael Bauer 2024-07-23 10:32:23 UTC
(In reply to Heiko Tietze from comment #70)

> The suggestion is to make the function optional by 
> a) adding "[ ] Detect languages" (off by default) to Tools > Options >
> Languages and Locale > Enhanced Language Support, 
> b) show document languages in the dropdown and require user to pick one from
> the character dialog (as today and like MSO does), and
> c) sort the results according comment 55.
> d) do not add recent languages (across sessions),

Agnostic re a), making it easier to find sounds like a good thing.

Re b): I repeat, the Character Dialog is a shit place for selecting document languages, it requires a lot of clicking about and is totally counter-intuitive for non LO UI devs. No other word processing software I've ever worked with ties the locale selection in with font/character selection.

Re: c)/d) fine, but I still don't see how that fixes the problem for a user form whom a) (language detection) fails and/or those who just don't want the software to guess the languages used because it gets it wrong too often (not just LO) and who can be bothered checking if a 48 page document has bits where Spanish has been mis-identified as Hungarian?

As in, back to the user case which actually brought on this 8 year old thread: I want to tell LO what the document/paragraph/selection language is in a) less than 3 clicks and b) less than 3 seconds and c) without a degree in software development when LO does not make sensible default suggestions?
Comment 73 Michael Bauer 2024-07-23 10:35:51 UTC
(In reply to pixie from comment #71)

> A hypothetical worst case scenario of 1.5*10^9 combinations of language,
> country, and writing system.

While I don't think all of those scenarios are even vaguely likely or required, this kind of brings me back to one of my original suggestions that there should be a dialogue that allows the user to select locales they want to be offered when writing documents. 

But I stopped harping on about it when consensus seemed to shift in a different direction and to be honest, I'm getting a little lost in terms of how the current proposed solution fixes the original problem but 8 years on, I'd be happy for ANY movement on this bug hitting test/release and then re-file if that doesn't actually fix the problem.
Comment 74 Mike Kaganski 2024-09-08 21:55:44 UTC
*** Bug 162875 has been marked as a duplicate of this bug. ***