Description: The Hungarian dictionary contains invalid UTF-8 sequences and cannot be used or converted. For exact details, see https://github.com/hunspell/hunspell/issues/559 Steps to Reproduce: Open hu_HU_u8.aff in gedit sudo apt install hunspell-hu gedit /usr/share/hunspell/hu_HU.aff --encoding=UTF-8 Actual Results: Bugged behavior (output) Gedit shows error. If by any chance it tries to interpret the file as ISO-8859-15 open the file with --encoding option in gedit. Expected Results: Expected behavior (output) No error should be shown by the text editor. Valid UTF-8 is expected. Reproducible: Always User Profile Reset: Yes Additional Info: Solution Invalid UTF appears only in comments and in flag vectors. Upstream is here https://sourceforge.net/projects/magyarispell/ , open the source tarball. The fix is in the file bin/u8myspell. The following script should fix it completely. #!/bin/bash set -x export LANG=en_US export LC_ALL=C case $# in 0|1|2) echo "u8myspell - converts MySpell dictionaries to UTF-8 usage: u8myspell source_name output_name source_charset"; exit 1;; esac i=$1 o=$2 charset=$3 localdir="$(dirname $0)" iconv -f "$charset" -t UTF-8 "$i.dic" | sed -f "$localdir"/l1_u8.sed > "$o.dic" iconv -f "$charset" -t UTF-8 "$i.aff" | sed 's/^SET .*$/SET UTF-8\ FLAG UTF-8/' | sed -f "$localdir"/l1_u8.sed > "$o.aff" Basically the latin2 is converted to utf8 and the command FLAG UTF-8 is additionally issued in .aff. User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
hu_HU.dic and hu_HU.aff file are not UTF-8 encoded files. They contain UTF-8 encoded dictionary items (words and morphemes), and the default 8-bit flags, see hunspell (5) manual page for dictionary format. The suggested conversion duplicates the memory footprint of the flag vectors, and decoding of the UTF-8 encoded flags slows down the dictionary loading by 70% (plain dic.) or 50% (alias compressed dic.), resulting noticeable differences in the user interface of LibreOffice.