Bug 147099 - Some dictionaries don't include apostrophes in WORDCHARS
Summary: Some dictionaries don't include apostrophes in WORDCHARS
Status: RESOLVED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
5.2.0.4 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-31 21:13 UTC by Perry Fraser
Modified: 2023-01-02 09:49 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Perry Fraser 2022-01-31 21:13:24 UTC
It seems that at least all en_* dictionaries besides en_GB do not include apostrophes in their WORDCHARS line of their affix files (see the .aff files in https://cgit.freedesktop.org/libreoffice/dictionaries/tree/en).

This does not directly impact LibreOffice, but can instead be seen when, for example, using hunspell with one of the affected dictionaries. For instance:

  ~/git/dictionaries master
  ❯ hunspell -d en/en_US
  Hunspell 1.7.0
  isn't
  & isn 9 0: sin, ins, ism, is, in, inn, ion, isl, is n
  *
  
  ~/git/dictionaries master
  ❯ hunspell -d en/en_GB
  Hunspell 1.7.0
  isn't
  *

It looks like this was first introduced in https://gerrit.libreoffice.org/c/dictionaries/+/25348/.

This is most likely resolvable by just adding a ' to the end of each WORDCHARS entry of these dictionaries.
Comment 1 Aron Budea 2022-12-12 19:25:41 UTC
The US English dictionary (word list) is maintained by Kevin Atkinson, please file issues you have with it here:
https://github.com/kevina/wordlist/issues

As far as I can read from the readme, the AU word list is also maintained by Kevin Atkinson.

The ZA dictionary is AFAIK unmaintained, but Marco, maintainer of the English Dictionaries collection (which, apart from the British one he's taking care of, includes the latest versions of the above dictionaries from different maintainers) might be kind enough to make this adjustment.

The repo of the English Dictionaries is here: https://github.com/marcoagpinto/aoo-mozilla-en-dict
Comment 2 Marco A.G.Pinto 2022-12-13 12:36:47 UTC
Yes, the ZA (South African) dictionary is missing that keyword.

I will add it for the next release.

# 2021-02-15 (Marco A.G.Pinto)
# Fixed: concious + conciousness.
#
# 2022-07-12 (Marco A.G.Pinto)
# Fixed: ! flag at start of words;
#           Removed duplicate words;
#           Sorted alphabetically the dictionary.

I have already done fixes to it before, since it is no longer maintained.

My idea was to add the proper names of ZA to GB, since most ZA users may be using GB.

I made a list of proper names and posted it in a ticket, but no reply.
Comment 3 Marco A.G.Pinto 2023-01-02 09:49:25 UTC
I have fixed the ZA dictionary:
South African (2023-01-01):
  - Added the: ICONV ’ '
  - Sorted alphabetically the tags of the .aff.

The .oxt is available from:
https://extensions.libreoffice.org/extensions/english-dictionaries