Bug 139193 - Spellchecking (Italian) broken in LibO 7.1.0 beta 1
Summary: Spellchecking (Italian) broken in LibO 7.1.0 beta 1
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.1.0.0.beta1+
Hardware: All Linux (All)
: medium normal
Assignee: Rene Engelhard
URL:
Whiteboard: target:7.2.0 target:7.1.0
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-23 15:39 UTC by Callegar
Modified: 2021-02-20 23:12 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Callegar 2020-12-23 15:39:17 UTC
Description:
In Libreoffice 7.1 the spellchecking is broken, at least for Italian.

The spellchecker flags as incorrect many correct words. Among them all words with accented characters (e.g. "perché" that is "why"). For these flagged words, it fails to provide meaningful suggestions. For instance for "Perché" there is no suggestion at all. Maybe, the spellcheckers tries to offer English corrections for the incorrectly flagged Italian words. In other cases, one gets suggestions with funny characters in place of the accented letters.

For some reasons, also some very short words words, such as the particle "di" get flagged in spite of not having accents.

Steps to Reproduce:
See description

Actual Results:
See description

Expected Results:
See description


Reproducible: Always


User Profile Reset: Yes



Additional Info:
[Information automatically included from LibreOffice]
Locale: en-US
Module: TextDocument
[Information guessed from browser]
OS: Linux (All)
OS is 64bit: yes
Comment 1 Callegar 2020-12-23 19:16:22 UTC
One example of an Italian word where the spellchecker chokes to the point of giving suggestions with funny chars is "difficoltà" (difficulty in English), for which I get the suggestion "difficolt�".
Comment 2 Julien Nabet 2020-12-24 09:33:41 UTC
Marina: thought you might be interested in this one noticing:
https://cgit.freedesktop.org/libreoffice/core/commit/?id=996cb4e64c8e7c4757695334c781efe8816caffc
author	Marina Latini <marina.latini@libreoffice.org>	2020-10-28 11:37:27 +0200
committer	Gerrit Code Review <gerrit@gerrit.libreoffice.org>	2020-10-28 10:37:27 +0100
commit 996cb4e64c8e7c4757695334c781efe8816caffc (patch)
tree d36fdd1e41d069ceebd01c12d1395fa7ebe0af0b
parent 28dddd4f7e255c74c17c0c6b263303f4567b5678 (diff)
Update git submodules
* Update dictionaries from branch 'master'
  to b75fdcfc8695ba95d624d348fa580ccbc2eff9ce
  - Italian Writing Aids extension forked by LibreItalia.
    
    See CHANGELOG.txt for more details.
Comment 3 Julien Nabet 2020-12-24 14:08:43 UTC
I added "it" in --with-lang entry of my autogen.input
I can reproduce this now on pc Debian x86-64 with master sources updated today.
Comment 4 Julien Nabet 2020-12-24 14:38:39 UTC
The pb is it_IT.dic is encoded with ISO-8859 text

When using iconv to convert "it_IT.dic" to UTF-8
(on Linux, you can use "iconv -f ISO-8859-1 -t UTF-8 it_IT.dic" to generate a new file then "mv" to replace the former file)
it works.

I don't want to push directly on "dictionaries" and I don't know how to use logerrit to submit the patch to review on "dictionaries" repo (as I do for core) so let you take a look.
Comment 5 Callegar 2020-12-24 14:56:54 UTC
I confirm that the file is in the wrong encoding and that fixing the encoding "by hand" in /opt/libreoffice7.1/share/extensions/dict-it (I am on ubuntu, don't know where this gets on other systems) fixes the issue.
Comment 6 Ming Hua 2020-12-24 15:13:27 UTC
Marina is the maintainer of the Italian spellchecker, though I'm not sure how close he/she checks LO bugzilla emails.  The commit in dictionaries repo is:

https://git.libreoffice.org/dictionaries/+/b75fdcfc8695ba95d624d348fa580ccbc2eff9ce

... which was reviewed and committed by Thorsten Behrens.  Adding him to CC.
Comment 7 Ming Hua 2021-01-19 04:08:42 UTC
Since we haven't got any response from either Marina or Thorsten, I've gone ahead and submitted the iconv'ed it_IT.dic file to Gerrit:
https://gerrit.libreoffice.org/c/dictionaries/+/109594
Let's try to get this solved before 7.1.0 release so that Italian-speaking users don't get an unpleasant surprise.

However, as I don't have a build environment for LO here, this commit was not tested.  I just looked at the diff and it seems reasonable.

@Julien, I've added you as a reviewer on Gerrit.  Would you please build and test the patch, or at least compare the file with your local iconv result to make sure I didn't mess up?
Comment 8 Ming Hua 2021-01-19 19:23:16 UTC
(In reply to Ming Hua from comment #7)
> I've gone ahead and submitted the iconv'ed it_IT.dic file to Gerrit:
> https://gerrit.libreoffice.org/c/dictionaries/+/109594
It turns out there is already https://gerrit.libreoffice.org/c/dictionaries/+/109321 .
Comment 10 Commit Notification 2021-01-20 08:48:00 UTC
Rene Engelhard committed a patch related to this issue.
It has been pushed to "libreoffice-7-1":

https://git.libreoffice.org/dictionaries/commit/28be227a705917bd78ee47f7c50a4adaf31740a7

deb#979439 tdf#139193 recode it_IT/it_IT.dic to UTF-8
Comment 11 Commit Notification 2021-01-20 08:48:14 UTC
Rene Engelhard committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/dictionaries/commit/87ca82e1a22bfc40c6fef0ddaa210053cf79f25f

deb#979439 tdf#139193 recode it_IT/it_IT.dic to UTF-8
Comment 12 Commit Notification 2021-01-20 11:19:01 UTC
Rene Engelhard committed a patch related to this issue.
It has been pushed to "libreoffice-7-1-0":

https://git.libreoffice.org/dictionaries/commit/625cc9846854ed05246a007a24095b580eebf8cf

deb#979439 tdf#139193 recode it_IT/it_IT.dic to UTF-8
Comment 13 Callegar 2021-02-20 22:59:08 UTC
Unfortunately the fix seems to be just "temporary".

The Italian dictionaries appear to be provided as an extension bundled with LibO. As of today, the extension manager reports that an update is available for that extension and proposes downloading the "Italian spelling dictionaries, hyphenation rules, and thesaurus" at version 2020.10.13 by "libreItalia".

Incidentally, for some reason that version is not listed at https://extensions.libreoffice.org/ that indicates some version "4.2" (in fact 2015.09.25) by PLIO as the latest version of the extension. Still the extension manager updater appears to find it.

In any case, as soon as one updates the extension, the issue is back. This is because file `it_IT.dic` again has the wrong latin1 encoding.

It is unclear to me if it is possible to report a bug against a specific extension, at a specific version, particularly when the latter is not even listed on the extensions website.
However, the fact that LibO bundles the extension and then proposes its upgrade ends up making any issue with this extension appear as a LibO issue.

I suggest removing this specific version of the extension from the actual repo on which LibO checks for extensions updates (where is it incidentally) or fixing the encoding as it has been done from the extension bundled with LibO.
Comment 14 Callegar 2021-02-20 23:12:20 UTC
My bad. The issue seems to be more subtle than I previously reported and related to the simultaneous usage of LibO 7.0 and LibO 7.1 on the same profile.

Because LibO 7.0 bundles the Italian dictionaries at version 2011.something, running LibO 7.0 you may have got prompted to update the extension to the 2020 version (maybe this required manually installing the 2015.something version first). If you do so, and then you run LibO 7.1 you end up with 2 versions of the Italian dictionaries at the same version, among which LibO seems to favor the user installed one that may have the wrong encoding.

Apparently as of today the issue is already fixed because LibO 7.0 does not seem to prompt anymore for updating the italian dictionary to the 2020.something version.

Hence, sorry for the noise, even if in any case, it would be good to have the 2020 update of the Italian dictionaries available (bundled or as an extension) even for LibO 7.0.