Hunspell can't locate words with letter g̃ hunspell -d gug Hunspell 1.3.3 aga & aga 1 0: anga Should find ág̃a We always had issues with g̃. Even when we created the Guarani keyboard. Seems similar to bug#39275
LibreOffice Writer/Calc/Impress have the same issue. g̃ shows up as g~
Note that Guarani’s nasal g is not a Unicode precomposed character — it’s a combination of a “g” plus U+0303 (combining tilde). Could that be the problem here? BTW, Hunspell’s bug tracker is https://github.com/hunspell/hunspell/issues
Command line Hunspell word tokenization differs from the LibreOffice break iterator. Hunspell in LibreOffice can handle such combined Unicode characters well, you only need to use UTF-8 encoded aff and dic files: ------ gug.aff ------ SET UTF-8 ..... # for suggestions with correct combined diacritics: MAP 2 MAP aá MAP g(g̃) ------- gug.dic ----- 100000 ág̃a (If both precomposed and combined diacritics are common for the given language, you need the canonical form See also Hunspell 4 manual, for example: Use parenthesized groups for character sequences (eg. for composed Uni‐ code characters): MAP 3 MAP ß(ss) (character sequence) MAP fi(fi) ("fi" compatibility characters for Unicode fi ligature) MAP (ọ́)o (composed Unicode character: ó with bottom dot)
Created attachment 129061 [details] Linux Libertine G has got better combined diacritics support, than Times New Roman
(Sorry, the end of the previous sentence:) If both precomposed and combined diacritics are common for the given language, you can use the precomposed (canonical?) form in the dictionary and use the ICONV command to convert the combined input to the precomposed form, and if you need, the OCONV command to convert the suggestions to combined characters. Note: LibreOffice layout has got good combining diactritics support with a few fonts, for example, Linux Libertine G, see the attached screenshot.