Bug 91766 - Automatic language detection for spell checking
Summary: Automatic language detection for spell checking
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 132294 (view as bug list)
Depends on:
Blocks: Spell-Checking Language-Detection
  Show dependency treegraph
 
Reported: 2015-05-31 04:23 UTC by Aleve Sicofante
Modified: 2024-07-21 10:07 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Word can't determine the language of different paragraphs. (17.89 KB, application/vnd.oasis.opendocument.text)
2015-05-31 16:47 UTC, Aleve Sicofante
Details
We have had this feature since forever. (247.31 KB, image/png)
2015-05-31 23:15 UTC, Adolfo Jayme Barrientos
Details
Sample multilingual document (25.04 KB, application/vnd.oasis.opendocument.text)
2021-01-15 17:09 UTC, Adalbert Hanßen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aleve Sicofante 2015-05-31 04:23:02 UTC
Automatic Language Detection for spell checking is an essential tool in office environments that do international business. MS Office has been doing it for some 20 years, Google Translate has been auto detecting language forever as well. The technology is well known and reliable. I'd like to encourage the design team to put this as a priority for upcoming versions. 

It should work in all components of LibreOffice, but of course Word is the major beneficiary.

Maybe this should be an enhancement request for Hunspell?
Comment 1 Adolfo Jayme Barrientos 2015-05-31 09:56:35 UTC
Ever looked at the status bar?

http://www.freedesktop.org/wiki/Software/libexttextcat/
Comment 2 Aleve Sicofante 2015-05-31 16:47:11 UTC
Created attachment 116199 [details]
Word can't determine the language of different paragraphs.
Comment 3 Aleve Sicofante 2015-05-31 16:51:27 UTC
Sorry, I meant Writer can't determine the language of different paragrpahs.
Comment 4 Aleve Sicofante 2015-05-31 16:52:13 UTC
(In reply to Aleve Sicofante from comment #2)
> Created attachment 116199 [details]
> Word can't determine the language of different paragraphs.

The attachment is an ODT document. Sorry for any confusion.
Comment 5 Adolfo Jayme Barrientos 2015-05-31 23:08:20 UTC Comment hidden (obsolete)
Comment 6 Adolfo Jayme Barrientos 2015-05-31 23:15:33 UTC
Created attachment 116210 [details]
We have had this feature since forever.

> Writer can't determine the language of different paragrpahs.

That assertion is simply incorrect.
Comment 7 Aleve Sicofante 2015-06-01 09:19:23 UTC
In the attached document, the first paragraph is in Spanish. The spell checking acts properly and nothing gets red-underlined.

The second paragraph is written in English. Writer doesn't seem to know that, and keeps trying to correct the paragraph as it was written in Spanish, hence the red underlining of the whole paragraph.

How is my assertion incorrect?
Comment 8 Adolfo Jayme Barrientos 2015-06-06 18:14:49 UTC
Do not be confused.

What you want is to create a new feature in which Writer automatically changes the spell-checking language for each paragraph, which would be costly in long documents.

But to state that Writer “can’t determine the language of different paragraphs” is a lie, as I’ve demonstrated in the screenshot I’ve attached.
Comment 9 Aleve Sicofante 2015-06-07 00:53:58 UTC
"What you want is to create a new feature in which Writer automatically changes the spell-checking language for each paragraph, which would be costly in long documents"

If it's costly or not is open to debate (it takes a handful of words for Google Translate to detect a language, sometimes as little as two words...) but the feature has been in MS Office (including Word and Outlook) for almost two decades now, if not longer, and it's VERY useful for international businesses.

I don't know exactly what your problem is, and I don't understand your attitude either. Are you always so angry?

Yes, I propose exactly what you finally understood. I think it was clear from the beginning, but maybe I wasn't clear enough. What has no excuse, though, is your completely unnecessary aggressive tone.
Comment 10 chomisyann 2016-05-06 07:48:43 UTC
Yes please, please, add this feature.
I am working every day in English and French and sometimes in Spanish.
That s the main reason I am still using a copy of word on my PC.
In word you just need to copy past any text and it corrects it whatever is the language of the text.
It works so nicelly.

I think this is a feature more important than a new database filter or whatever geeke feature. This really makes life easier for the 90% users. (maybe at least 25% that works in several languages)


+1
Comment 11 Tyco72 2018-04-09 08:23:18 UTC
I wonder since ever why this basic feature is still not implemented in LO, and it is not the only one.
That the work of the developers focuses mainly on geeky features instead of on all the little bugs and improvements useful to the 90-99% of the users, it is the main limit/issue of the open software. But they should consider that the 90% of the $ donations to LO comes form that 90-99% of common users.
Comment 12 Xisco Faulí 2019-11-29 13:27:14 UTC
Changing priority back to 'medium' since the number of duplicates is lower than 5
Comment 13 Heiko Tietze 2020-09-14 13:11:47 UTC
*** Bug 132294 has been marked as a duplicate of this bug. ***
Comment 14 Adalbert Hanßen 2021-01-15 17:06:50 UTC
(In reply to Xisco Faulí from comment #12)
> Changing priority back to 'medium' since the number of duplicates is lower
> than 5

I was just about to make a new proposal but when entering it, I came across these duplicates. It probably would be better, to add my comment here rather than adding a new duplicate. Some of my ideas are already in the discussion above, but there are new ideas which cope with the "costly" argument. So here we go:

If you want to use the spell checker in a multilingual document, you must assign the correct languages to the different parts of the document. Without this step, larger parts of the document would be checked against the spelling rules of another language, recognized as wrong and therefore highlighted with red snake lines. 

Editing multilingual texts would become easier if you could tell the spell checker to check the spelling in all languages for which the correct longpack is installed. I suggest an additional "Automatic" option for this, which could be set at Tools>Language>... and Format>Characters>Language.

For a text passage to which this choice applies, LO Writer should check the text - for example, from the beginning of the sentence (i.e., after the last period, colon, question mark, exclamation mark, or a quotation mark) against all languages whose language pack is installed, and it should automatically assign the language that has the fewest errors in the language used in the check (minimum of characters to be underlined in red snake in that language).

If the introduction would conflict with the odt file format definition (if that does not provide a feature for automatic language selection), one could consider setting the language on the fly during editing to the one until a sentence is completed (i.e., until one of the punctuation marks mentioned).
Side question: is language actually a property of a character, i.e. a feature like font size, boldface/slash/underline, color, etc.?

Suggestion on the side: In the spelling correction as an additional choice another installed language and also "no language check"
Comment 15 Adalbert Hanßen 2021-01-15 17:09:07 UTC
Created attachment 168921 [details]
Sample multilingual document
Comment 16 hellbourne79 2021-03-15 09:52:16 UTC
I am a translator for a news agency. Russian to English mostly. But most of the texts are very short and I save a lot of files.

I'd love LibreOffice to be able to automatically recognize that a piece of text is in a different language. Maybe it could be done after the spell checker encounters an excessive number of errors in several words in one sequence? Such a situation could trigger switch to another spell checking language.

I see this feature has been requested for years. I will see what can be done with macros in LibreOffice if anything.
Comment 17 Evgeniy Dolgikh 2021-04-15 06:59:36 UTC
Few months ago I tried to move users in my organization to LO from MS Office. We mostly working with documents that have paragraphs in different languages. And my users refused to use LO cause no one of almost 50 people wants to manually mark parts of text like "this is english, and this is russian" when MS products doing this automatically.
Comment 18 Adalbert Hanßen 2021-05-31 13:38:21 UTC
Computation time for spell checking in other languages can easily be reduced by confining it to languages present in the actual document. 

Once a particular language has been assigned to a part of the document, it should be taken into consideration for the next word which would get flagged with an wiggled underline because of a spell check in the current language. 

If such a word would be right in another language, which has been selected in the same document, it should automatically be flagged to be in the other language.

To further speed up spell checking:

1. Spell checking should first consider the language at the beginning of a paragraph.

2. If the last word before the current word was right in another language, it should then consider the current word to be in that other language. If it is misspellt, then it should first consider the language of the majority of that paragraph. 

3. If the current word is neither spellt right according to rule 1 nor according to rule 2, other languages present in the current document should be checked such that languages with higher content in the current document are considered first.
Comment 19 Mike Kaganski 2021-12-14 06:32:21 UTC
There is a feature in LibreOffice to use system input language to set language of typed text. It is implemented currently on Windows and on Qt5 (bug 108151). Where it is implemented, it is a reliable and unambiguous feature, making use of input switch performed by user. That is widely used by any user who uses keyboard layouts that require switching (e.g., bilinguals using a Cyrillic and Latin alphabets) - they have keyboard shortcuts like Shift+Alt in muscle memory. But those who typically use same layout roe different languages (Roman languages), often don't even know that such a feature exists, and for them, some magic is expected detecting that their one-character "a" is in some specific language.

We already have a "magic" "detecting" possible language of a text, that works when you work with Tools->Language menu, and also when you have a spellcheck error. The list of offered languages is created based on statistical fingerprint heuristics, and we already have multiple bugs showing how unreliable and random that detection is (see e.g. bug 139185 comment 4).

I would think that implementing this proposal is only reliably possible by employing mind-reading elf farm serving requests from LibreOffice in real time.