Bug 70339 - Word boundary definition problem
Summary: Word boundary definition problem
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
4.1.1.2 release
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Spell-Checking Dictionaries
  Show dependency treegraph
 
Reported: 2013-10-10 11:06 UTC by Michael Bauer
Modified: 2023-09-02 03:05 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot of problem (41.47 KB, image/jpg)
2013-10-10 11:06 UTC, Michael Bauer
Details
new opera screenshot (5.68 KB, image/jpeg)
2015-03-04 13:00 UTC, Michael Bauer
Details
screenshot of spell check, it replaces the s of 's with 's, resulting in ''s (48.82 KB, image/jpeg)
2016-01-12 22:42 UTC, FutureProject
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Bauer 2013-10-10 11:06:13 UTC
Created attachment 87382 [details]
Screenshot of problem

My problem is specifically with Scottish Gaelic but judging by the responses from the Hunspell team, it's a general issue.

I have attached a screenshot of how certain items are handled by three different applications (LibreOffice 4.1.1.2, Firefox 24, Opera 17 (now Chrome-based)). All three have a different way of wrongly underlining certain items which occur with a high frequency in Gaelic:
's th' bh' Bh' Th' 'S B' b' d' 'gam 'ga h-Alba n-aran t-aran 

The .dic file contains (at least theoretically) all the necessary items for these to be identified as correct forms:
's th' bh' b' d' 'gam 'ga
plus rules/tags which allow the prefixing of h- n- t- to certain items
h-Alba n-aran t-aran

But each application then goes and identifies an apparently random selection of these and wrongly underlines them.

We'd assumed that the following settings should prevent this type of thing:

WORDCHARS -'’

# replace correct accented double vowels with unaccented ones for acceptance
ICONV 1
ICONV ’ '

But we were told that "WORDCHARS of Hunspell is not a system-wide setting" and that "WORDCHARS is only for the command line Hunspell executable (in fact, Hunspell library doesn't recognize WORDCHARS, but the Hunspell executable loads the beginning of the affix file for a few extra settings). LibreOffice, Firefox etc. use their own tokenization mechanism. Conversion beetwen character encodings, word breaking of input texts are not part of the Hunspell library."

So it would seem something in LO needs fixing but I'm not sure what or where?
Comment 1 ign_christian 2014-07-05 09:29:23 UTC
Hi Michael..just a ping since it's been reported months ago. Does it still happen in LO 4.2.5.2 or 4.3.0.2 ?

If resolved please change status to RESOLVED WORKSFORME, unless UNCONFIRMED
Comment 2 Michael Bauer 2014-07-07 21:01:33 UTC
Hi Christian,

Just upgraded to 4.2.5.2 and yes, the problem is still the same and yes, I would still love a solution for this issue. Thanks for following up.
Comment 3 tommy27 2015-02-28 18:04:38 UTC
@Micheal
any improvement with 4.4.1.2?
Comment 4 Michael Bauer 2015-02-28 23:15:32 UTC
Just installed 4.4.1.2, exactly the same problem still :(
Comment 5 Buovjaga 2015-03-04 12:45:43 UTC
Is aspell any different from Hunspell in this?
Comment 6 Michael Bauer 2015-03-04 13:00:19 UTC
Created attachment 113873 [details]
new opera screenshot

According to the Aspell wiki page, Opera uses Aspell so yes, it is similarly affected but the outcome is slightly different. I have just tested the same words in the latest version of Opera and the 'wrong' underlines are in somewhat different places (see screenshot) but still there.
Comment 7 FutureProject 2016-01-12 22:40:22 UTC
Windows 10 Pro, Version 1511 (OS Build 10586.36)
Version: 5.0.4.2
Build ID: 2b9802c1994aa0b7dc6079e128979269cf95bc78
Locale: de-DE (de_DE)
Using LO extension "Scottish Gaelic Spellchecker 3.0 (May 27, 2015)"

First, I can confirm the underlining of aforementioned words in LO.

Second, the claim that -'´ are not part of word boundaries seems correct, as demonstrated in the newly attached screenshot. You can see that the 's only has its letter highlighted in red, not the '. But, in the suggestions, it lists 's as a possible replacement, which results in ''s when chosen, which is underlined again the next time LO does a check. The ' of 's is not getting recognized as part of the words. QED

Because of this, I'm setting the status to NEW.
Comment 8 FutureProject 2016-01-12 22:42:33 UTC
Created attachment 121891 [details]
screenshot of spell check, it replaces the s of 's with 's, resulting in ''s
Comment 9 QA Administrators 2017-03-06 14:25:18 UTC Comment hidden (obsolete)
Comment 10 Michael Bauer 2017-03-06 14:29:23 UTC
Windows 10, LO 5.3.0.3, the problem persists exactly the same way.
Comment 11 QA Administrators 2018-03-07 03:41:31 UTC Comment hidden (obsolete)
Comment 12 Michael Bauer 2018-03-07 09:21:47 UTC
Problem persists.
Comment 13 QA Administrators 2019-03-08 03:40:54 UTC Comment hidden (obsolete)
Comment 14 Michael Bauer 2019-03-08 10:32:03 UTC
Problem persists
Comment 15 QA Administrators 2021-08-26 03:47:28 UTC Comment hidden (obsolete)
Comment 16 Michael Bauer 2021-08-31 17:46:51 UTC
Just installed 7.2.0 and the issue has improved somewhat. Of the original list only d' now has an underline. So better but no cigar just yet. And it would be good to know if this was a deliberate fix or some chance fluke that may disappear again?
Comment 17 QA Administrators 2023-09-01 03:14:41 UTC Comment hidden (obsolete)
Comment 18 Michael Bauer 2023-09-01 15:58:34 UTC
It persists but it's not as bad as it was, previously it affected

's th' bh' Bh' Th' 'S B' b' d' 'gam 'ga h-Alba n-aran t-aran 

but now the only item with a misplaced red squiggle is
d'
Comment 19 QA Administrators 2023-09-02 03:05:54 UTC
Dear Michael Bauer,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
 
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword


Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug