Bug 93611 - Spell checker does not treat many characters normally not part of words in the language being verified as separators
Summary: Spell checker does not treat many characters normally not part of words in th...
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
4.4.2.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Spell-Checking
  Show dependency treegraph
 
Reported: 2015-08-24 03:52 UTC by andréb
Modified: 2024-10-11 15:41 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description andréb 2015-08-24 03:52:03 UTC
The title describes problem clearly.

Sometimes to keep words together for line wrap, or show them as associated in a long sentence, it is useful to join them by a period.
The spell checker sees such words as a long word with one or more "." in the middle.
As well, "=." and ".." and "..." are seen as words.
(Although "this=word" is correctly seen as 2 words.)

The spell checker should clearly view "." and "=" and any other non-alphabetic character (except "-" and "'") as not part of a word.
The exceptions could vary according to the language.
In my case, I use the french language on Linux.

In no case (at least in french or english) is "." or "=" part of a valid word.
Comment 1 tommy27 2015-08-24 11:34:29 UTC
would you please upload a simple test file with some concrete examples?

by the way you are encouraged to upgrade from 4.4.2.2 to 4.4.5.2
Comment 2 andréb 2015-09-01 02:55:16 UTC
When I created a test file, I found that most of the symptoms disappeared, although they persist in already existing files.

Note that I use the french language locale, so that may be related.
one problem that continues is 
 word1.word2
or
 word1.word2.word3
will be underlined entirely if any component (i.e. word1 or word2 or word3) is not in the dictionary.  But not if all components are in the dictionary, and the joined words (e.g. word1.word2) is not.

I'll try to simplify an existing file to demonstrate the problem.
Comment 3 Robinson Tryon (qubit) 2016-03-03 14:23:39 UTC
Hi andre,

(In reply to andréb from comment #2)
> I'll try to simplify an existing file to demonstrate the problem.

Do you continue to experience problems when testing a recent build of LibreOffice? (e.g. v5.0.x or later)

Status -> NEEDINFO
Comment 4 Xisco Faulí 2016-10-10 11:24:10 UTC Comment hidden (obsolete)
Comment 5 QA Administrators 2016-11-08 12:47:54 UTC Comment hidden (obsolete)
Comment 6 Urmas 2016-11-10 15:30:27 UTC
Still in 5.2.0.
Comment 7 QA Administrators 2017-11-12 11:01:07 UTC Comment hidden (obsolete)
Comment 8 tommy27 2017-11-12 11:49:44 UTC
still present in 5.3.6 and 5.4.2
Comment 9 Sukender 2017-11-14 13:36:41 UTC
I think there is a more general issue, compared to the initial comment. Indeed, some languages do use a dot as a "normal" character (as Lojban, ISO code "jbo"). For instance, ".i" is a word.

Moreover, some words may contain '.', such as described in a Hunspell issue : https://github.com/hunspell/hunspell/issues/231

So the issues may be:
1. Add a language-dependent behavior for specific characters such as .,'- (etc.)
2. Allow those specific characters to be part of a "complete" word.

My 2 cents.
Comment 10 andréb 2017-11-15 06:33:51 UTC
(In reply to Sukender from comment #9)
> I think there is a more general issue, compared to the initial comment.
> Indeed, some languages do use a dot as a "normal" character (as Lojban, ISO
> code "jbo"). For instance, ".i" is a word.
> 
> Moreover, some words may contain '.', such as described in a Hunspell issue
> : https://github.com/hunspell/hunspell/issues/231
> 
> So the issues may be:
> 1. Add a language-dependent behavior for specific characters such as .,'-
> (etc.)
> 2. Allow those specific characters to be part of a "complete" word.
> 
> My 2 cents.

I agree that the issue is more general than the title, as is implicit in the description of the problem.
So I changed the title accordingly.
Feel free to improve the title.
Comment 11 QA Administrators 2018-11-16 03:42:13 UTC Comment hidden (obsolete)
Comment 12 andréb 2020-10-08 04:09:39 UTC
Libreoffice 6,4,6,2 (build 6.4.6.2-1.mga7)
still has the bug.

e.g. one.two is still considered a correct word in english,
while onne or twoo is not.

Similarly, un.deux in french.
Comment 13 QA Administrators 2022-10-09 03:50:21 UTC Comment hidden (obsolete)
Comment 14 QA Administrators 2024-10-09 03:15:52 UTC Comment hidden (obsolete)
Comment 15 andréb 2024-10-11 15:41:26 UTC
Problem is no longer present in libreoffice version 24.2.5.2 (64 bit)
 in fr-CA locale.

I tested:
un deux un.deux un_deux un&deux undeux

Only the last showing spelling error, so correct in all cases.

Equivalent in english:
one two one.two one_two one&two onetwo

Feel free to reopen if not correct in another locale.