Bug 159164 - Spellchecking - South African English
Summary: Spellchecking - South African English
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
7.6.0.0 alpha0+
Hardware: All All
: medium normal
Assignee: Marco A.G.Pinto
URL:
Whiteboard: target:24.8.0 target:24.2.1
Keywords: notBibisectable, regression
: 159518 (view as bug list)
Depends on:
Blocks: Dictionaries
  Show dependency treegraph
 
Reported: 2024-01-13 08:10 UTC by James
Modified: 2024-02-02 07:40 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Example document of words flagged that are in fact Correctly spelt. (9.00 KB, application/msword)
2024-01-13 08:17 UTC, James
Details
Fix for issue in en-ZA (5.82 MB, application/vnd.openofficeorg.extension)
2024-01-28 15:10 UTC, Marco A.G.Pinto
Details

Note You need to log in before you can comment on or make changes to this bug.
Description James 2024-01-13 08:10:32 UTC
Description:
The spellchecker for Englis (South Africa) which is the same as for UK spelling, the checker highlight words as misspelt, but they are not misspelt. 

Steps to Reproduce:
1.Writing the following words as example: providing, managing financials 
2 
3.

Actual Results:
All three these words in a document are flagged as misspelt.  The suggestions for 'correct' spelling' are NOT correct. I could email an actual document for you to see?   You notice that here, in this box the words are not  flagged, but in a documant they are flagged. 

Expected Results:
Spell checker by corrected to NOT flagged words that are spelt correctly.


Reproducible: Always


User Profile Reset: Yes

Additional Info:
If you provide an email address, I could send you and actual document in which correctly spelt words are flagged.
Comment 1 James 2024-01-13 08:17:10 UTC
Created attachment 191909 [details]
Example document of words flagged that are in fact Correctly spelt.

The spellchecker for South African and UK English, flags certain words that are correctly spelt.
Comment 2 Buovjaga 2024-01-25 16:24:09 UTC
I reproduce with this dictionary: https://github.com/marcoagpinto/aoo-mozilla-en-dict and I confirm the words are found in the word list. I'm not sure, if this is an issue with the dictionary or LibreOffice code, so I won't change the status.

Version: 24.2.0.1.0+ (X86_64) / LibreOffice Community
Build ID: 3e6fa4da057be191aac0973e5131d271de0d5e61
CPU threads: 8; OS: Linux 6.7; UI render: default; VCL: gtk3
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: threaded

Repro with bibisect repositories back to 7.2 (as far as I could bother checking).
Comment 3 Marco A.G.Pinto 2024-01-26 03:44:51 UTC
Heya,

I can confirm that the words do appear as typos in this .doc file.

I don't know what is causing it, since they are in the en_ZA dictionary.

I used LibreOffice 7.6.4.1.

Notice: I am not sure if it is LanguageTool related, since with the LanguageTool add-on enabled they do not appear as typos, and if I disable it, they appear as typos.
Comment 4 Stéphane Guillou (stragu) 2024-01-28 03:13:27 UTC
Without installing an extra dictionary, I can reproduce in release builds 7.6.4.1 but not with 7.5.9.2 (in which only "Backgroud" is underlined, as expected).
I don't have LanguageTool installed.
Unrelated to file format, same happens with DOCX and ODT.

I think Buovjaga could reproduce all the way back to 7.2 in comment 2 because the installed dictionary is the latest version and already has the bug.

It might have come from Marco's 24f938c016b62aa8f05e23e2be4f51e7ead51e65 (which is for [03a891f1b00d9b1a14c6a72fb33cb682b549053a] in the dictionaries repository).
The commit updates the GB+ZA dictionaries, see: https://git.libreoffice.org/dictionaries/+/03a891f1b00d9b1a14c6a72fb33cb682b549053a%5E%21/

It can't be checked with the bibisect repository as it does not include dictionaries. So I tested the extension in two versions in LO 7.5:
- dict-en-20201201.oxt: not reproduced
- dict-en-20230101_lo.oxt: reproduced

It indeed started with this release: https://github.com/marcoagpinto/aoo-mozilla-en-dict/releases/tag/2023-02-01za

Marco, can you please have a look at what broke the en-SA spellcheck in that version of the dictionary?
Comment 5 Marco A.G.Pinto 2024-01-28 07:26:12 UTC
I will have a look at it this afternoon or night, trying to see what changed.
Comment 6 Marco A.G.Pinto 2024-01-28 07:49:48 UTC
Meanwhile, here are the changes I have done to ZA (.aff lists them):

# Affix file for British English MySpell dictionary
# Also suitable as basis for Commonwealth and European English.
# Built from scratch for MySpell. Released under LGPL.
#
# David Bartlett, Andrew Brown.
# R 1.18, 11/04/05
# 2010-03-09 (nemeth AT OOo)
#  - UTF-8 encoded dictionary:
#       - fix em-dash problem of OOo 3.2 by BREAK
#       - suggesting words with typographical apostrophes
#       - recognizing words with Unicode f ligatures
#  - add phonetic suggestion (Copyright (C) 2000 Björn Jacke, see the end of the file)
#
# 2021-02-15 (Marco A.G.Pinto)
# Fixed: concious + conciousness.
#
# 2022-07-12 (Marco A.G.Pinto)
# Fixed: ! flag at start of words;
#           Removed duplicate words;
#           Sorted alphabetically the dictionary.
#
# 2023-01-01 (Marco A.G.Pinto)
# Added the: ICONV ’ '
# Sorted alphabetically the tags of the .aff
#
# 2023-01-17 (Marco A.G.Pinto)
# Added: Czechia + Czechia's
#
# 2023-01-26 (Marco A.G.Pinto)
# Fixed/improved: flag 3
#

If you can find out what went wrong, it would be great.
Comment 7 Marco A.G.Pinto 2024-01-28 15:00:17 UTC
Ahhhhhhhhh…

I found out what the issue was, I added:

ICONV ’ '

Because I didn't see it was there already at the end of the .aff.

This is what was causing the issue.

On 1-FEB, I will release a fixed .oxt.

Thanks for discovering this issue.
Comment 8 Marco A.G.Pinto 2024-01-28 15:10:54 UTC
Created attachment 192204 [details]
Fix for issue in en-ZA

This fixes the issue.

Please test it before 1-FEB because in 1-FEB I will release the patch.

I tested myself and it seems okay.
Comment 9 Marco A.G.Pinto 2024-01-28 15:13:41 UTC
Heya,

I have fixed the ZA issue.

Please test the .oxt attachment I have included here before 1-FEB (release date for the .oxt).

For me, all works fine again, but if you encounter an issue, please let me know.
Comment 10 Stéphane Guillou (stragu) 2024-01-29 02:06:55 UTC
Thank you Marco, I tested the updated extension with:

Version: 24.2.0.2 (X86_64) / LibreOffice Community
Build ID: b1fd3a6f0759c6f806568e15c957f97194bbec8f
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

and it does indeed fix it. Much appreciated!
Comment 11 Marco A.G.Pinto 2024-02-01 08:58:32 UTC
It is fixed:
https://extensions.libreoffice.org/extensions/english-dictionaries



- ZA
  * Fix: Removed the: ICONV ’ ' because it was already at the end of the .aff;
    Fix: apostrophe handling, by adding: WORDCHARS 0123456789’ to the .aff;
    Improved flag J adding 424 words.
Comment 12 Stéphane Guillou (stragu) 2024-02-01 09:13:09 UTC
Thank you, Marco! Are you also committing the update to the dictionaries repository?
Comment 13 Marco A.G.Pinto 2024-02-01 09:19:15 UTC
(In reply to Stéphane Guillou (stragu) from comment #12)
> Thank you, Marco! Are you also committing the update to the dictionaries
> repository?

I only commit to Gerrit twice a year: May and November.


These two months are what will come with the next major releases of LibreOffice.
Comment 14 Buovjaga 2024-02-01 10:30:28 UTC
(In reply to Marco A.G.Pinto from comment #13)
> (In reply to Stéphane Guillou (stragu) from comment #12)
> > Thank you, Marco! Are you also committing the update to the dictionaries
> > repository?
> 
> I only commit to Gerrit twice a year: May and November.
> 
> 
> These two months are what will come with the next major releases of
> LibreOffice.

Would be nice to have right away, though, as this is quite severe for South African users.
Comment 15 Marco A.G.Pinto 2024-02-01 10:41:16 UTC
(In reply to Buovjaga from comment #14)
> (In reply to Marco A.G.Pinto from comment #13)
> > (In reply to Stéphane Guillou (stragu) from comment #12)
> > > Thank you, Marco! Are you also committing the update to the dictionaries
> > > repository?
> > 
> > I only commit to Gerrit twice a year: May and November.
> > 
> > 
> > These two months are what will come with the next major releases of
> > LibreOffice.
> 
> Would be nice to have right away, though, as this is quite severe for South
> African users.

Sure, I will commit it tomorrow.

No worries.
Comment 16 Marco A.G.Pinto 2024-02-01 14:28:05 UTC
It is here: https://gerrit.libreoffice.org/c/dictionaries/+/162881
Comment 17 Commit Notification 2024-02-01 16:17:58 UTC
Marco A.G.Pinto committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/dictionaries/commit/208a9fd80b2a182fe20f224cd615119c6323ae2e

tdf#159164 Update the English dictionaries: GB+ZA+AU+CA+US
Comment 18 Stéphane Guillou (stragu) 2024-02-02 06:43:11 UTC
Thanks, Marco!
Ilmari, is this cherrypickable to 7.6? Not sure about the process for dictionaries.
Comment 19 Stéphane Guillou (stragu) 2024-02-02 06:44:43 UTC
*** Bug 159518 has been marked as a duplicate of this bug. ***
Comment 20 Commit Notification 2024-02-02 07:39:28 UTC
Marco A.G.Pinto committed a patch related to this issue.
It has been pushed to "libreoffice-24-2":

https://git.libreoffice.org/dictionaries/commit/d7bb4585419ce4496498ae5291947a3bf06ea1ec

tdf#159164 Update the English dictionaries: GB+ZA+AU+CA+US
Comment 21 Buovjaga 2024-02-02 07:40:14 UTC
(In reply to Stéphane Guillou (stragu) from comment #18)
> Thanks, Marco!
> Ilmari, is this cherrypickable to 7.6? Not sure about the process for
> dictionaries.

I got a merge conflict when trying to cherry pick to 7.6, so I will not pursue it.