Bug 93611 - Spell checker does not treat many characters normally not part of words in the language being verified as separators
Summary: Spell checker does not treat many characters normally not part of words in th...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
4.4.2.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Spell-Checking
  Show dependency treegraph
 
Reported: 2015-08-24 03:52 UTC by andréb
Modified: 2022-10-09 03:50 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description andréb 2015-08-24 03:52:03 UTC
The title describes problem clearly.

Sometimes to keep words together for line wrap, or show them as associated in a long sentence, it is useful to join them by a period.
The spell checker sees such words as a long word with one or more "." in the middle.
As well, "=." and ".." and "..." are seen as words.
(Although "this=word" is correctly seen as 2 words.)

The spell checker should clearly view "." and "=" and any other non-alphabetic character (except "-" and "'") as not part of a word.
The exceptions could vary according to the language.
In my case, I use the french language on Linux.

In no case (at least in french or english) is "." or "=" part of a valid word.
Comment 1 tommy27 2015-08-24 11:34:29 UTC
would you please upload a simple test file with some concrete examples?

by the way you are encouraged to upgrade from 4.4.2.2 to 4.4.5.2
Comment 2 andréb 2015-09-01 02:55:16 UTC
When I created a test file, I found that most of the symptoms disappeared, although they persist in already existing files.

Note that I use the french language locale, so that may be related.
one problem that continues is 
 word1.word2
or
 word1.word2.word3
will be underlined entirely if any component (i.e. word1 or word2 or word3) is not in the dictionary.  But not if all components are in the dictionary, and the joined words (e.g. word1.word2) is not.

I'll try to simplify an existing file to demonstrate the problem.
Comment 3 Robinson Tryon (qubit) 2016-03-03 14:23:39 UTC
Hi andre,

(In reply to andréb from comment #2)
> I'll try to simplify an existing file to demonstrate the problem.

Do you continue to experience problems when testing a recent build of LibreOffice? (e.g. v5.0.x or later)

Status -> NEEDINFO
Comment 4 Xisco Faulí 2016-10-10 11:24:10 UTC Comment hidden (obsolete)
Comment 5 QA Administrators 2016-11-08 12:47:54 UTC Comment hidden (obsolete)
Comment 6 Urmas 2016-11-10 15:30:27 UTC
Still in 5.2.0.
Comment 7 QA Administrators 2017-11-12 11:01:07 UTC Comment hidden (obsolete)
Comment 8 tommy27 2017-11-12 11:49:44 UTC
still present in 5.3.6 and 5.4.2
Comment 9 Sukender 2017-11-14 13:36:41 UTC
I think there is a more general issue, compared to the initial comment. Indeed, some languages do use a dot as a "normal" character (as Lojban, ISO code "jbo"). For instance, ".i" is a word.

Moreover, some words may contain '.', such as described in a Hunspell issue : https://github.com/hunspell/hunspell/issues/231

So the issues may be:
1. Add a language-dependent behavior for specific characters such as .,'- (etc.)
2. Allow those specific characters to be part of a "complete" word.

My 2 cents.
Comment 10 andréb 2017-11-15 06:33:51 UTC
(In reply to Sukender from comment #9)
> I think there is a more general issue, compared to the initial comment.
> Indeed, some languages do use a dot as a "normal" character (as Lojban, ISO
> code "jbo"). For instance, ".i" is a word.
> 
> Moreover, some words may contain '.', such as described in a Hunspell issue
> : https://github.com/hunspell/hunspell/issues/231
> 
> So the issues may be:
> 1. Add a language-dependent behavior for specific characters such as .,'-
> (etc.)
> 2. Allow those specific characters to be part of a "complete" word.
> 
> My 2 cents.

I agree that the issue is more general than the title, as is implicit in the description of the problem.
So I changed the title accordingly.
Feel free to improve the title.
Comment 11 QA Administrators 2018-11-16 03:42:13 UTC Comment hidden (obsolete)
Comment 12 andréb 2020-10-08 04:09:39 UTC
Libreoffice 6,4,6,2 (build 6.4.6.2-1.mga7)
still has the bug.

e.g. one.two is still considered a correct word in english,
while onne or twoo is not.

Similarly, un.deux in french.
Comment 13 QA Administrators 2022-10-09 03:50:21 UTC
Dear andréb,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
 
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword


Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug