Bug 154954 - User dictionary: auto-learn new words
Summary: User dictionary: auto-learn new words
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Not Assigned
Keywords: needsUXEval
Depends on:
Blocks: Spell-Checking Dictionaries
  Show dependency treegraph
Reported: 2023-04-21 10:58 UTC by tpypta
Modified: 2023-05-04 12:39 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Note You need to log in before you can comment on or make changes to this bug.
Description tpypta 2023-04-21 10:58:54 UTC
LibreOffice could greatly benefit from an auto-learning user dictionary, which could learn new words if they are contained in the document more than (let's say) five times. I.e. if an unknown word for a given language is spelled in the same way for five times it should automatically lnad in the user dictionary. 

1. Way lesser red-marked words for longer documents like studies and dissertations, thus boosting LibreOffice user comfort in these usage cases.
2. Built-in dictionaries are not so great for smaller languages, and also for scientific field-spesific terminology.

It would be also awesome to be able to sync the said dictionary in an easier way than defining paths... which doesn't work for me (no file created in the specified directory).

Steps to Reproduce:
1. Open a long document containing scientific terminology, for exemple in the field of Humanities
2. Enable the spell checking

Actual Results:
See all the red underlinings which can be avoided if the user dictionary is automatically populated from the document itself.

Expected Results:
Way lesser quantity of red underlined / false-flagged words.

Reproducible: Always

User Profile Reset: No

Additional Info:
Version: (X86_64) / LibreOffice Community
Build ID: 53bb9681a964705cf672590721dbc85eb4d0c3a2
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: bg-BG (bg_BG); UI: en-US
Calc: CL threaded
Comment 1 Stéphane Guillou (stragu) 2023-04-21 13:08:19 UTC
Sounds like a sensible new feature, with a setting defining how many times something is written before being added. I like your rationale, tpypta.

I.e.: "Once this misspelled word has been written X times for this specific language, it is automatically added to this language's dictionary."
We supposedly already "count" occurrences for the autocomplete feature, so might not need too much work. But unsure of how behaviour differs between opening a file with many words already detected as misspelled, versus collecting the words per session as they are written.

UX team?

Couldn't find a duplicate.
Comment 2 m_a_riosv 2023-04-21 14:01:41 UTC
How we know the word is right?, It could create automagically, bad words without notice.
Comment 3 Stéphane Guillou (stragu) 2023-04-21 14:06:46 UTC
(In reply to m.a.riosv from comment #2)
> -1
> How we know the word is right?, It could create automagically, bad words
> without notice.

Off by default, for sure.
But I can see the appeal for domain-specific writers who are tired of manually adding dozens of words, for example of scientific articles. The likeliness of writing the same word wrong in the same way five times if fairly small.
Comment 4 Julien Nabet 2023-04-22 08:56:41 UTC
-1 too for me, you just have to right click and select "add to dictionary" option.

About the number of times you need to be sure it's correct, I completely disagree. Sometimes you're sure you got the right spelling whereas it's not the case. Just take a look at the typo fixes of Andrea Gelmini and you'll see some people doing regularly the same mistake. (as a French guy, I include myself in these faulty persons when typing in English and I do my best when typing in French but I'm far from perfect there too).

Now to respond to this potential demand, if there's a domain with dozens of specific words, perhaps a mechanism to import specific dictionaries would be more appropriate.
Comment 5 Heiko Tietze 2023-04-24 07:42:19 UTC
Interesting proposal but rather suited for a macro, IMO. As Miguel and Julien commented, the workflow is to go through the misspelled words one by one anyway, and after you accept the first occurrence you wont get bothered again. It might be an interesting "fun fact" how often a supposed typo happens. What I mean is adding this number for example to the "Not in Dictionary" label if the number of exactly the same unknown word is larger than 1. Would read "Not in dictionary (but used 5 times in the document)".
Comment 6 Heiko Tietze 2023-05-04 12:31:03 UTC
The topic was on the agenda at the design meeting but did not receive further input.

It's not clear if the (auto) correction is correct, and adding to the user dictionary is always just a click away. The benefits of such an option do not outweigh the effort to implement and the danger of misuse. The idea should be realized better via macro.
Comment 7 Stéphane Guillou (stragu) 2023-05-04 12:39:13 UTC
Agreed, thanks everyone for sharing your opinion.

Would be great to see more domain-specific "dictionaries" added to our Extensions website, like the existing Bible dictionaries and the integrated Technical dictionary.