Bug 154499 - Accept consecutive words with spaces aka phrases in spellchecker
Summary: Accept consecutive words with spaces aka phrases in spellchecker
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: x86-64 (AMD64) All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard: target:7.6.0
Keywords: needsDevAdvice
Depends on:
Blocks: Spell-Checking
  Show dependency treegraph
 
Reported: 2023-03-31 06:29 UTC by Nehru
Modified: 2023-12-09 10:47 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nehru 2023-03-31 06:29:37 UTC
Description:
we are write two or more word local or other language word example(amma appa )in LibreOffice ignore at a time   

Steps to Reproduce:
1.Go to LibreOffice writer.
2.then write two or more word local or other language word example(amma appa )
3.then select two or more word ignore at a time its not ignore at a time

Actual Results:
two more word its not ignore at a time

Expected Results:
two more word ignore at a time


Reproducible: Always


User Profile Reset: No

Additional Info:
additional feature
Comment 1 Stéphane Guillou (stragu) 2023-04-14 07:38:01 UTC
UX team, what do you think of adding the ability to ignore several words that are not in the dictionary at once? E.g. select range, right-click and "ignore all" if one is detected in the range.
Comment 2 Heiko Tietze 2023-04-14 07:59:55 UTC
"amma appa" sounds artificial to me. Could imagine something like "Armin Le Grande", "Milhouse van Houten" (assuming lower case text) or "pasta del diablo" - the white space makes it two words. For "penne all'arrabbiata" it is just one word.

The dictionary does not accept spaces; tried to directly add the full name per Options... > (List of ignored words) > Edit, and I don't know if it's possible at all. Besides, the UI/UX for this would be awkward, thinking of a user-selection in the "Note in Dictionary" textedit (would replace the red highlighted typo).

Marco, what do you think?
Comment 3 Marco A.G.Pinto 2023-04-14 08:09:35 UTC
(In reply to Heiko Tietze from comment #2)
> "amma appa" sounds artificial to me. Could imagine something like "Armin Le
> Grande", "Milhouse van Houten" (assuming lower case text) or "pasta del
> diablo" - the white space makes it two words. For "penne all'arrabbiata" it
> is just one word.
> 
> The dictionary does not accept spaces; tried to directly add the full name
> per Options... > (List of ignored words) > Edit, and I don't know if it's
> possible at all. Besides, the UI/UX for this would be awkward, thinking of a
> user-selection in the "Note in Dictionary" textedit (would replace the red
> highlighted typo).
> 
> Marco, what do you think?

Heya,

What, I think, is that Hunspell dictionaries don't accept words with spaces, so I can't add multiple words per entry.

However, this is a long time wish: enable LibreOffice to add to the personal dictionary words with spaces.

Maybe by selecting the whole words, right-click and "add to personal dictionary".

This would also allow LanguageTool to add multiwords into LibreOffice, since there is a setting in it to add the internal dictionary of LanguageTool into LibreOffice.

This is an old struggle of mine.

There are tons of words with spaces in the LanguageTool spelling files, but they only work with the browser add-on.
Comment 4 Heiko Tietze 2023-04-14 08:23:36 UTC
So hunspell cannot deal with this (for now), but internally we might. UI/interaction proposal in c2. Changing the summary to reflect the idea of consecutive words.

Loosely related: Bug 80358 - The spell-checker is mixed up when correcting in between two consecutive wrong words
Comment 5 V Stuart Foote 2023-04-14 14:18:06 UTC
Poked at this a bit thinking use of NBSP U+00A0 would suffice, so preloaded (by text editing) some common with space compounds in English into user profile's 'standard.dic' 

inter alia
stare decisis
per se
terra firma
ad hoc
per capita

Even with the NBSP entries present the spellcheck does not parse as compound from the dictionary and the word fails spellcheck.

Seems our spell check insists on testing only at ICU word bounds, but couldn't it?

@László, I imagine this is nothing new for Hunspell implementations, the OOo era see also suggests its been considered.
Comment 6 Heiko Tietze 2023-04-17 07:24:38 UTC Comment hidden (off-topic)
Comment 7 Commit Notification 2023-05-08 17:00:40 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/5619fc438273cd15e78539e78b8af751bca24b1a

tdf#154499 sw spell checking: add 2-word phrase checking

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 László Németh 2023-05-08 17:03:54 UTC
Commit description:

tdf#154499 sw spell checking: add 2-word phrase checking

Hunspell dictionaries can contain phrases, i.e. space
separated word sequences, which were used only to reject
compounds and to give better suggestions. Now recognize
2-word phrases in the text, no need to break the phrase
into single words, e.g. "et" and "cetera", which resulted
acceptance of typos (e.g. "et" without "cetera"), also
bad suggestions (e.g. "et" and "cetera" independently from
the context).

More example:

== old .dic file ==
...
et
cetera
von
Neumann
veni
vidi
vici
...

List the 2-word phrases, and break the 3 or more word
into 2-word phrases:

== new .dic file ==
...
et cetera
von Neumann
veni vidi
vidi vici
...

Note: words of the phrase are separated by a space, but
recognized also with punctuation, e.g. in the previous
example, "Veni, vidi, vici."

Note: during typing, the second word of the phrase
will be accepted only at ending the paragraph, i.e.
pressing Enter.
Comment 9 László Németh 2023-05-08 17:15:58 UTC
@Nehru et al.: thanks for the idea and comments! Lightproof and LanguageTool allow to check phrases, but it's worth to create a simple and more portable solution, using only Hunspell and the personal dictionary.

I plan to remove the limitation of the personal dictionary, too.
Comment 10 Heiko Tietze 2023-05-09 06:54:06 UTC
(In reply to László Németh from comment #8)
> Now recognize 2-word phrases in the text...
Cool stuff! But how to add "veni vidi vici" to the user dictionary?
Comment 11 László Németh 2023-05-09 10:19:47 UTC
(In reply to Heiko Tietze from comment #10)
> (In reply to László Németh from comment #8)
> > Now recognize 2-word phrases in the text...
> Cool stuff! But how to add "veni vidi vici" to the user dictionary?

@Heiko: Thanks! I prefer to use the correct form with punctuation:

"veni, vidi, vici"

 or

"Veni, vidi, vici."

and the hidden inner form can be

"veni vidi"
"vidi vici"

now, and anything else later.

This way is not only more user-friendly (i.e. using orthographically correct dictionary items), but Lightproof and LanguageTool can use the correct form to check the requested punctuation, too.
Comment 12 Heiko Tietze 2023-05-09 10:47:38 UTC
(In reply to László Németh from comment #11)
> I prefer to use the correct form with punctuation---
Correct punctuation aside, how do I add a new phrase to the dictionary?
Comment 13 László Németh 2023-05-09 10:55:10 UTC
(In reply to Heiko Tietze from comment #12)
> (In reply to László Németh from comment #11)
> > I prefer to use the correct form with punctuation---
> Correct punctuation aside, how do I add a new phrase to the dictionary?

@Heiko: as a new line at the end of the dic file:

---------
et cetera

For the custom dictionary, I plan to remove the limitation related to the spaces, and break the space (and punctuation) separated items to 2-word segments.
Comment 14 Heiko Tietze 2023-05-09 11:40:26 UTC
(In reply to László Németh from comment #13)
> > Correct punctuation aside, how do I add a new phrase to the dictionary?
> as a new line at the end of the dic file:

Happy you made it possible for dictionary creator butz ordinary users press F7 and Accept. I assume you didn't change anything and would propose to allow extending the selection in the spell check dialog and to add the whole selection.
Comment 15 László Németh 2023-05-09 12:14:41 UTC
(In reply to Heiko Tietze from comment #14)
> (In reply to László Németh from comment #13)
> > > Correct punctuation aside, how do I add a new phrase to the dictionary?
> > as a new line at the end of the dic file:
> 
> Happy you made it possible for dictionary creator butz ordinary users press
> F7 and Accept. I assume you didn't change anything and would propose to
> allow extending the selection in the spell check dialog and to add the whole
> selection.

It seems for me, it's a good idea to use the (already working) selection in the Spelling dialog window later, but there is an almost working option to add phrases: clicking on the Options..., selecting the custom dictionary, and Edit... The only problem is the limitation in the Add new word input box, i.e. it doesn't  accept sapces, yet, but this limitation likely is easily removable.

Also preprocessing of the full document could help to recognize the document-specific phrases (family names etc.) automatically, and suggest their acceptance, selection etc. The only bottleneck is the development.

The idea of the recent development is to allow to extend the .dic file with phrases, which are relatively frequent, but the dictionary developers didn't want to add them to the dictionary related to the know problems with breaking them into single words. I hope, to go beyong word-level spell checking will be a small, but very attractive feature for them, and for every users of Writer. :)
Comment 16 László Németh 2023-05-09 20:14:57 UTC
Proposed fix for the custom dictionary:

https://gerrit.libreoffice.org/c/core/+/151596

Note: this allows spaces in the replacement string of the negative custom dictionary, which seems to be useful for Hunspell dictionaries with compound word recognition.
Comment 17 Commit Notification 2023-05-10 07:55:02 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/b1568a4cd8b439de19aab2bfe5f8f8465e4dc6af

tdf#154499 spell checking: allow phrases in custom dictionary

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Lars Jødal 2023-08-21 05:05:07 UTC
This is a great idea, but I cannot get it to work. I am using the about-to-be-released 7.6.0.3 version, which should include this feature, right?

1. I have added "inter alia" to the (Danish) dictionary. The dictionary already contains "inter" as a word, but not "alia" as a separate word.
2. I install the dictionary in LO 7.6.0.3.
3. Testing: writing "inter alia", the word "alia" is marked as a spelling error.

However, there IS a change from LO 7.5 (7.5.4.2).

a. The name "Aie Sorn" is in the dictionary including the space. Neither "Aie" nor "Sorn" are words in the dictonary.
b. If I write "Aie", it is marked as a spelling error (both 7.5 and 7.6.0.3).
c. If I write "Sorn", it is marked as a spelling error (both 7.5 and 7.6.0.3).
d. In 7.5: If I write "Aie Sorn", both words are marked as spelling errors.
e. In 7.6.0.3: If I write "Aie Sorn", the word "Aie" is not longer marked as a spelling error, only "Sorn".

So, the new feature seems to partly work, but only partly.

Version: 7.6.0.3 (X86_64) / LibreOffice Community
Build ID: 69edd8b8ebc41d00b4de3915dc82f8f0fc3b6265
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: da-DK (da_DK); UI: da-DK
Calc: CL threaded
Comment 19 Heiko Tietze 2023-08-21 06:43:15 UTC
To be continued... at bug 156833.
Comment 20 Peter Jansen 2023-09-17 11:49:49 UTC
(In reply to László Németh from comment #11)
> (In reply to Heiko Tietze from comment #10)
> > (In reply to László Németh from comment #8)
> > > Now recognize 2-word phrases in the text...
> > Cool stuff! But how to add "veni vidi vici" to the user dictionary?
> 
> @Heiko: Thanks! I prefer to use the correct form with punctuation:
> 
> "veni, vidi, vici"
> 
>  or
> 
> "Veni, vidi, vici."
> 
> and the hidden inner form can be
> 
> "veni vidi"
> "vidi vici"
> 
> now, and anything else later.
> 
> This way is not only more user-friendly (i.e. using orthographically correct
> dictionary items), but Lightproof and LanguageTool can use the correct form
> to check the requested punctuation, too.

May I suggest that "veni, vidi, vici" had best be formatted as being Latin? Only the translation is an English phrase.

I have so far always resorted to doing the same with "per se" etc., which really ought to be accepted as English. Even my rudiments of Latin are so dreadful, that I have instelled spell-checking for Latin, too.