Created attachment 202073 [details] new flags WIP Heya, everyone, This ticket is related to: https://bugs.documentfoundation.org/show_bug.cgi?id=167649 It appears that my commits (releases) “ruined” the original .AFF files. I was on Libera channel trying to get additional information about it and the feeling I had is that the original .AFF files had morphological information regarding words (link provided by Cloph): https://gerrit.libreoffice.org/c/dictionaries/+/25348 It is impossible for me to go back to the old .AFF files, since in these 14 or 15 years of development I have fixed tons of bugs in flags and added new features and new flags. Could I suggest that the “AM” flags could be made independently of the .AFF files? What I mean is that instead of having each .AFF file loaded with “AM”s they could be a file in the dictionaries folder and apply to all five variants of English. On 1-JAN-2026 I will “take over” U.S. + Canada + Australia and I have been heavily patching flags to deal with the U.S. verbs and alike. One year later, 1-JAN-2027, I have planned the 5th generation of English dictionaries. I am attaching here the 2026 WIP files for you to see. Please help find a solution, but the best way would be a separate file for all five variants. Thanks! Your friend, >Marco A.G.Pinto ---------------
Created attachment 202074 [details] default text for .aff 2026+
Created attachment 202075 [details] README file for 2026
Created attachment 202076 [details] GitHub text 2026+ (ODT)
Created attachment 202077 [details] GitHub text 2026+ (txt)
Forgot László Németh.
Andras: thought you might be interested in this one since it concerns dictionaries.
Guys, I was looking at the 2016 files, and they have different numbers of AMs depending on the English variant. May I join the 5 variants AMs, remove duplicates and sort them alphabetically? For example: AM 1834 AM ts:0 AM st:abatis ts:Ns AM ts:0 al:abode AM ts:0 st:abide AM st:ax ts:Ns AM st:addendum ts:Ns I would have them all without duplicates and sorted alphabetically, and then I would just replace the AM 1834 (in this example) with the total number of AM flags. If this resolves the whole issue, on 1-JAN-2026 I will commit to Gerrit with me already taking over U.S. + Canada + Australia. U.S. is difficult to take over, that is why I am dedicating one full year to it.
(In reply to Marco A.G.Pinto from comment #7) Hi Marco, AM/AF (Alias Morphology/Alias Flag vector) are only for replacing flag vectors and morphological description with an index in the dic file to compress the dictionary, see man (5) hunspell, and makealias: $ makealias -h makealias: make alias compressed dic and aff files Usage: makealias [--minimize-diff old_file_without_file_extension] file.dic file.aff > AM 1834 > AM ts:0 #1 > AM st:abatis ts:Ns #2 In the example above, "1" in the dic file means "ts:0", "2" means "st:abatis ts:Ns" etc. It's not possible to reorder AM lines without changing the indices in the .dic file, if we don't want to lose the information, which word has got the stem "abatis" in the .dic file. Fortunately we don't need AM/AF at all. The working strategies to get back the lost functionality: 1) using my original script attached to the OpenOffice.org issue, which extends the dictionaries with morphological description: real stems ("st:") and the other affixed forms ("am:" ~allomorphs) (and use the result directly or its smaller version compressed with makealias). or 2) add new word to the original .dic file with alias indices. The new words cannot contain flags, so it must create "unmunched" version from the new words, listening all of their affixed forms. To create this word list, you can use Kevin Hendrick's original "unmunch", or my scipt "wordforms" (part of the Hunspell tools). hunspell/src/tools$ ./wordforms Usage: wordforms [-s | -p] dictionary.aff dictionary.dic word -s: print only suffixed forms -p: print only prefixed forms
Created attachment 202202 [details] Nemeth's script from ooo The script I downloaded from ooo.
Created attachment 202203 [details] All fixes I have done suggested by ChatGPT 4.1
(In reply to László Németh from comment #8) > (In reply to Marco A.G.Pinto from comment #7) > > Hi Marco, > > AM/AF (Alias Morphology/Alias Flag vector) are only for replacing flag > vectors and morphological description with an index in the dic file to > compress the dictionary, see man (5) hunspell, and makealias: > > $ makealias -h > makealias: make alias compressed dic and aff files > Usage: makealias [--minimize-diff old_file_without_file_extension] file.dic > file.aff > > > AM 1834 > > AM ts:0 #1 > > AM st:abatis ts:Ns #2 > > In the example above, "1" in the dic file means "ts:0", "2" means "st:abatis > ts:Ns" etc. It's not possible to reorder AM lines without changing the > indices in the .dic file, if we don't want to lose the information, which > word has got the stem "abatis" in the .dic file. Fortunately we don't need > AM/AF at all. > > The working strategies to get back the lost functionality: > > 1) using my original script attached to the OpenOffice.org issue, which > extends the dictionaries with morphological description: real stems ("st:") > and the other affixed forms ("am:" ~allomorphs) (and use the result directly > or its smaller version compressed with makealias). > > or > > 2) add new word to the original .dic file with alias indices. The new words > cannot contain flags, so it must create "unmunched" version from the new > words, listening all of their affixed forms. To create this word list, you > can use Kevin Hendrick's original "unmunch", or my scipt "wordforms" (part > of the Hunspell tools). > > hunspell/src/tools$ ./wordforms > Usage: wordforms [-s | -p] dictionary.aff dictionary.dic word > -s: print only suffixed forms > -p: print only prefixed forms Nemeth or any other developers, I have done all the fixes I could suggested by GPT and also installed the two Hunspell related packages on my VM with Ubuntu 24.04. I still get errors even reducing the .DIC to just two or three entries for testing. parsing line: # Z --> S parsed in 13 prefixes and 53 suffixes .awk: line 1: improper use of next cat: /home/marco-pinto/Desktop/nemeth/pos/part-of-speech.txt: No such file or directory .cat: /home/marco-pinto/Desktop/nemeth/agid/infl.txt: No such file or directory .awk: line 1: regular expression compile failed (syntax error ^* or ^+) ^* .cat: /tmp/z.aff: No such file or directory awk: line 1: improper use of next ....... Verifying. Different words (if not 0, check /tmp/diff.log): 0 Alias compression... 52 0/201,204 0th/205,203 1/201,202 1st/205 1th/203,300 2/201,204 2nd/205 2th/203,300 3/201,204 3rd/205 3th/203,300 4/201,204 4th/205,203 5/201,204 5th/205,203 6/201,204 6th/205,203 7/201,204 7th/205,203 8/201,204 8th/205,203 9/201,204 9th/205,203 10s/205,203 20s/205,203 30s/205,203 40s/205,203 50s/205,203 60s/205,203 70s/205,203 80s/205,203 90s/205,203 100s/205,203 200s/205,203 300s/205,203 400s/205,203 500s/205,203 600s/205,203 700s/205,203 800s/205,203 900s/205,203 1000s/205,203 2000s/205,203 '10s '20s '30s '40s '50s '60s '70s '80s '90s .marco-pinto@marco-pinto-VirtualBox:~/Desktop/nemeth$