Created attachment 125252 [details]
Diff of en_ZA.dic
These are some differences in the previous LibreOffice dictionary compared to English Dictionaries (2016.05.01) that should be discussed and resolved.
1. hyph_en_GB/hyph_en_US seem to be a bit newer in LO than in English Dictionaries, I suggest using the newer one
version 2011-10-07 compared to version 2010-03-16
files: hyph_en_GB.dic, hyph_en_US.dic, README_hyph_en_GB.txt, README_hyph_en_US.txt
2. en_GB.aff in LO had these comments that English Dictionaries was missing, not sure about what these rules are (I can see that there isn't NOSUGGEST or COMPOUNDRULE), please verify
# 2008-12-18 - NOSUGGEST, NUMBER/COMPOUNDRULE patches (nemeth AT OOo)
# 2010-03-09 (nemeth AT OOo)
# - UTF-8 encoded dictionary:
# - fix em-dash problem of OOo 3.2 by BREAK
# - suggesting words with typographical apostrophes
# - recognizing words with Unicode f ligatures
# - add phonetic suggestion (Copyright (C) 2000 Björn Jacke, see the end of the file)
3. there were some changes in the AU, GB and ZA dictionaries in bug 61660, please verify, and make those changes at least in GB (not sure about what to do with AU and ZA)
See bug 61660 and this commit: https://cgit.freedesktop.org/libreoffice/dictionaries/commit/?id=7e4239060266bf238b5e6692ed10d548c37572d5
4. en_ZA had a significant amount of mutual differences, I'm attaching the result of a diff, all the entries with "-" are missing from the newer dictionary.
I'm not the one to evaluate whether they should be included or not, but I noticed it, and wanted to point it out.
Okay, 4. is not an issue, once the diff is sorted from 2nd character, it turns out the words aren't missing from the newer dictionary, just placed somewhere else (it seems to not be sorted as a whole, but consists of several sorted parts).
Created attachment 125417 [details]
Current GB .AFF
I was just editing the GB README and, at line 34, we have:
This is a locally hosted copy of the English dictionaries with fixed dash handling and new ligature and phonetic suggestion support extension:
Original version of the en_GB dictionary:
OpenOffice.org patch and morphological extension.
The morphological extension based on Wordlist POS and AGID data
created by Kevin Atkinson and released on http://wordlist.sourceforge.net.
OOo Issue 48060 - add numbers with affixes by COMPOUNDRULE (1st, 111th, 1990s etc.)
OOo Issue 29112, 55498 - add NOSUGGEST flags to taboo words
New REP items (better suggestions for accented words and a few mistakes)
OOo Issue 63541 - remove *dessicated
2008-12-18 nemeth AT OOo
With a closer look, one could add the text from your comment at the "2008-12-18" but I need someone to create the compound rule for me because I don't know how to do it (found attached the current GB .AFF) and also the NOSUGGEST (Németh, please tell us if NOSUGGEST is automatic by adding an "!" to the words in the .DIC or if needs something else to work.
Remember that this .AFF isn't automatically generated from a wordlist like the US/CA, so I can't copy (I think) the compound rules because I believe they use codes that are used in the affixes/suffixes.
PS->Could someone attach the most recent hyphenation of US+GB in a compressed archive here for me to update in the OXT? When I used Git do create a folder in my desktop, the date in the downloaded files became the current date. Maybe the date it is not important though... :-)
I have just added Németh to the Cc.
[14:31] <marcoagpinto> I don't know how to add compounding or whatever it is called to the .AFF (1st, 2nd, 3rd, blah blah)
Created attachment 125431 [details]
Hyphenation patterns for US+GB, v2011.10.07
Thank you for the update, Marco. Here's the zip with the current hyphenation patterns, FYI file endings are in unix format.
I have just uploaded/updated the OXT to V2016-07-01:
Please noticed that I will go to the North of Portugal on Tuesday on vacation and will have limited Internet access during one week.
Here are the changes in the OXT:
Updated the hyphenation patterns to 2011-10-07 (from LibreOffice):
- US + GB
Updated the Dictionaries:
- British (Marco A.G.Pinto)*
* British has 1107 new words (2016-06-01) + 738 new words (2016-07-01).
It now uses NOSUGGEST keyword for offensive words.
It now uses COMPOUNDING (Áron Budea)
Cherry-picked again for the final 5.2.0 release candidate as 258bf15aac7975e1202558b6d922be8a9a072b37