Description: In Libreoffice 7.1 the spellchecking is broken, at least for Italian. The spellchecker flags as incorrect many correct words. Among them all words with accented characters (e.g. "perché" that is "why"). For these flagged words, it fails to provide meaningful suggestions. For instance for "Perché" there is no suggestion at all. Maybe, the spellcheckers tries to offer English corrections for the incorrectly flagged Italian words. In other cases, one gets suggestions with funny characters in place of the accented letters. For some reasons, also some very short words words, such as the particle "di" get flagged in spite of not having accents. Steps to Reproduce: See description Actual Results: See description Expected Results: See description Reproducible: Always User Profile Reset: Yes Additional Info: [Information automatically included from LibreOffice] Locale: en-US Module: TextDocument [Information guessed from browser] OS: Linux (All) OS is 64bit: yes
One example of an Italian word where the spellchecker chokes to the point of giving suggestions with funny chars is "difficoltà" (difficulty in English), for which I get the suggestion "difficolt�".
Marina: thought you might be interested in this one noticing: https://cgit.freedesktop.org/libreoffice/core/commit/?id=996cb4e64c8e7c4757695334c781efe8816caffc author Marina Latini <marina.latini@libreoffice.org> 2020-10-28 11:37:27 +0200 committer Gerrit Code Review <gerrit@gerrit.libreoffice.org> 2020-10-28 10:37:27 +0100 commit 996cb4e64c8e7c4757695334c781efe8816caffc (patch) tree d36fdd1e41d069ceebd01c12d1395fa7ebe0af0b parent 28dddd4f7e255c74c17c0c6b263303f4567b5678 (diff) Update git submodules * Update dictionaries from branch 'master' to b75fdcfc8695ba95d624d348fa580ccbc2eff9ce - Italian Writing Aids extension forked by LibreItalia. See CHANGELOG.txt for more details.
I added "it" in --with-lang entry of my autogen.input I can reproduce this now on pc Debian x86-64 with master sources updated today.
The pb is it_IT.dic is encoded with ISO-8859 text When using iconv to convert "it_IT.dic" to UTF-8 (on Linux, you can use "iconv -f ISO-8859-1 -t UTF-8 it_IT.dic" to generate a new file then "mv" to replace the former file) it works. I don't want to push directly on "dictionaries" and I don't know how to use logerrit to submit the patch to review on "dictionaries" repo (as I do for core) so let you take a look.
I confirm that the file is in the wrong encoding and that fixing the encoding "by hand" in /opt/libreoffice7.1/share/extensions/dict-it (I am on ubuntu, don't know where this gets on other systems) fixes the issue.
Marina is the maintainer of the Italian spellchecker, though I'm not sure how close he/she checks LO bugzilla emails. The commit in dictionaries repo is: https://git.libreoffice.org/dictionaries/+/b75fdcfc8695ba95d624d348fa580ccbc2eff9ce ... which was reviewed and committed by Thorsten Behrens. Adding him to CC.
Since we haven't got any response from either Marina or Thorsten, I've gone ahead and submitted the iconv'ed it_IT.dic file to Gerrit: https://gerrit.libreoffice.org/c/dictionaries/+/109594 Let's try to get this solved before 7.1.0 release so that Italian-speaking users don't get an unpleasant surprise. However, as I don't have a build environment for LO here, this commit was not tested. I just looked at the diff and it seems reasonable. @Julien, I've added you as a reviewer on Gerrit. Would you please build and test the patch, or at least compare the file with your local iconv result to make sure I didn't mess up?
(In reply to Ming Hua from comment #7) > I've gone ahead and submitted the iconv'ed it_IT.dic file to Gerrit: > https://gerrit.libreoffice.org/c/dictionaries/+/109594 It turns out there is already https://gerrit.libreoffice.org/c/dictionaries/+/109321 .
And also https://gerrit.libreoffice.org/c/dictionaries/+/109291 (-7-1) and https://gerrit.libreoffice.org/c/dictionaries/+/109332 (-7-1-0)
Rene Engelhard committed a patch related to this issue. It has been pushed to "libreoffice-7-1": https://git.libreoffice.org/dictionaries/commit/28be227a705917bd78ee47f7c50a4adaf31740a7 deb#979439 tdf#139193 recode it_IT/it_IT.dic to UTF-8
Rene Engelhard committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/dictionaries/commit/87ca82e1a22bfc40c6fef0ddaa210053cf79f25f deb#979439 tdf#139193 recode it_IT/it_IT.dic to UTF-8
Rene Engelhard committed a patch related to this issue. It has been pushed to "libreoffice-7-1-0": https://git.libreoffice.org/dictionaries/commit/625cc9846854ed05246a007a24095b580eebf8cf deb#979439 tdf#139193 recode it_IT/it_IT.dic to UTF-8
Unfortunately the fix seems to be just "temporary". The Italian dictionaries appear to be provided as an extension bundled with LibO. As of today, the extension manager reports that an update is available for that extension and proposes downloading the "Italian spelling dictionaries, hyphenation rules, and thesaurus" at version 2020.10.13 by "libreItalia". Incidentally, for some reason that version is not listed at https://extensions.libreoffice.org/ that indicates some version "4.2" (in fact 2015.09.25) by PLIO as the latest version of the extension. Still the extension manager updater appears to find it. In any case, as soon as one updates the extension, the issue is back. This is because file `it_IT.dic` again has the wrong latin1 encoding. It is unclear to me if it is possible to report a bug against a specific extension, at a specific version, particularly when the latter is not even listed on the extensions website. However, the fact that LibO bundles the extension and then proposes its upgrade ends up making any issue with this extension appear as a LibO issue. I suggest removing this specific version of the extension from the actual repo on which LibO checks for extensions updates (where is it incidentally) or fixing the encoding as it has been done from the extension bundled with LibO.
My bad. The issue seems to be more subtle than I previously reported and related to the simultaneous usage of LibO 7.0 and LibO 7.1 on the same profile. Because LibO 7.0 bundles the Italian dictionaries at version 2011.something, running LibO 7.0 you may have got prompted to update the extension to the 2020 version (maybe this required manually installing the 2015.something version first). If you do so, and then you run LibO 7.1 you end up with 2 versions of the Italian dictionaries at the same version, among which LibO seems to favor the user installed one that may have the wrong encoding. Apparently as of today the issue is already fixed because LibO 7.0 does not seem to prompt anymore for updating the italian dictionary to the 2020.something version. Hence, sorry for the noise, even if in any case, it would be good to have the 2020 update of the Italian dictionaries available (bundled or as an extension) even for LibO 7.0.