Bug Hunting Session
Bug 70339 - Word boundary definition problem
Summary: Word boundary definition problem
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
4.1.1.2 release
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Spell-Checking Dictionaries
  Show dependency treegraph
 
Reported: 2013-10-10 11:06 UTC by Michael Bauer
Modified: 2019-08-26 20:37 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot of problem (41.47 KB, image/jpg)
2013-10-10 11:06 UTC, Michael Bauer
Details
new opera screenshot (5.68 KB, image/jpeg)
2015-03-04 13:00 UTC, Michael Bauer
Details
screenshot of spell check, it replaces the s of 's with 's, resulting in ''s (48.82 KB, image/jpeg)
2016-01-12 22:42 UTC, FutureProject
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Bauer 2013-10-10 11:06:13 UTC
Created attachment 87382 [details]
Screenshot of problem

My problem is specifically with Scottish Gaelic but judging by the responses from the Hunspell team, it's a general issue.

I have attached a screenshot of how certain items are handled by three different applications (LibreOffice 4.1.1.2, Firefox 24, Opera 17 (now Chrome-based)). All three have a different way of wrongly underlining certain items which occur with a high frequency in Gaelic:
's th' bh' Bh' Th' 'S B' b' d' 'gam 'ga h-Alba n-aran t-aran 

The .dic file contains (at least theoretically) all the necessary items for these to be identified as correct forms:
's th' bh' b' d' 'gam 'ga
plus rules/tags which allow the prefixing of h- n- t- to certain items
h-Alba n-aran t-aran

But each application then goes and identifies an apparently random selection of these and wrongly underlines them.

We'd assumed that the following settings should prevent this type of thing:

WORDCHARS -'’

# replace correct accented double vowels with unaccented ones for acceptance
ICONV 1
ICONV ’ '

But we were told that "WORDCHARS of Hunspell is not a system-wide setting" and that "WORDCHARS is only for the command line Hunspell executable (in fact, Hunspell library doesn't recognize WORDCHARS, but the Hunspell executable loads the beginning of the affix file for a few extra settings). LibreOffice, Firefox etc. use their own tokenization mechanism. Conversion beetwen character encodings, word breaking of input texts are not part of the Hunspell library."

So it would seem something in LO needs fixing but I'm not sure what or where?
Comment 1 ign_christian 2014-07-05 09:29:23 UTC
Hi Michael..just a ping since it's been reported months ago. Does it still happen in LO 4.2.5.2 or 4.3.0.2 ?

If resolved please change status to RESOLVED WORKSFORME, unless UNCONFIRMED
Comment 2 Michael Bauer 2014-07-07 21:01:33 UTC
Hi Christian,

Just upgraded to 4.2.5.2 and yes, the problem is still the same and yes, I would still love a solution for this issue. Thanks for following up.
Comment 3 tommy27 2015-02-28 18:04:38 UTC
@Micheal
any improvement with 4.4.1.2?
Comment 4 Michael Bauer 2015-02-28 23:15:32 UTC
Just installed 4.4.1.2, exactly the same problem still :(
Comment 5 Buovjaga 2015-03-04 12:45:43 UTC
Is aspell any different from Hunspell in this?
Comment 6 Michael Bauer 2015-03-04 13:00:19 UTC
Created attachment 113873 [details]
new opera screenshot

According to the Aspell wiki page, Opera uses Aspell so yes, it is similarly affected but the outcome is slightly different. I have just tested the same words in the latest version of Opera and the 'wrong' underlines are in somewhat different places (see screenshot) but still there.
Comment 7 FutureProject 2016-01-12 22:40:22 UTC
Windows 10 Pro, Version 1511 (OS Build 10586.36)
Version: 5.0.4.2
Build ID: 2b9802c1994aa0b7dc6079e128979269cf95bc78
Locale: de-DE (de_DE)
Using LO extension "Scottish Gaelic Spellchecker 3.0 (May 27, 2015)"

First, I can confirm the underlining of aforementioned words in LO.

Second, the claim that -'´ are not part of word boundaries seems correct, as demonstrated in the newly attached screenshot. You can see that the 's only has its letter highlighted in red, not the '. But, in the suggestions, it lists 's as a possible replacement, which results in ''s when chosen, which is underlined again the next time LO does a check. The ' of 's is not getting recognized as part of the words. QED

Because of this, I'm setting the status to NEW.
Comment 8 FutureProject 2016-01-12 22:42:33 UTC
Created attachment 121891 [details]
screenshot of spell check, it replaces the s of 's with 's, resulting in ''s
Comment 9 QA Administrators 2017-03-06 14:25:18 UTC Comment hidden (obsolete)
Comment 10 Michael Bauer 2017-03-06 14:29:23 UTC
Windows 10, LO 5.3.0.3, the problem persists exactly the same way.
Comment 11 QA Administrators 2018-03-07 03:41:31 UTC Comment hidden (obsolete)
Comment 12 Michael Bauer 2018-03-07 09:21:47 UTC
Problem persists.
Comment 13 QA Administrators 2019-03-08 03:40:54 UTC Comment hidden (obsolete)
Comment 14 Michael Bauer 2019-03-08 10:32:03 UTC
Problem persists