Bug 139388 - Upgrade Dutch dictionary for spelling checker to version 2.20.21
Summary: Upgrade Dutch dictionary for spelling checker to version 2.20.21
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Dictionaries
  Show dependency treegraph
 
Reported: 2021-01-03 17:08 UTC by Pander
Modified: 2022-09-28 09:34 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pander 2021-01-03 17:08:40 UTC
Description:
Please, upgrade Dutch dictionary for spelling checker files to version 2.20.19. This is a major upgrade as it was nine years since that last release. Files can be found at https://github.com/opentaal/opentaal-hunspell

Actual Results:
Spelling checker uses old Dutch dictionary

Expected Results:
Spelling checker should use newest Dutch dictionary


Reproducible: Always


User Profile Reset: No



Additional Info:
Take care about the following details:

1) Rename in https://cgit.freedesktop.org/libreoffice/dictionaries/tree/ the directory nl_NL to nl This was already fixed in the Debian packages a few years ago. The spelling for written Dutch is identical for the Netherlands, Belgium, Suriname, etc. hence only nl is used. Also rename nl_NL.dic to nl.dic and nl_NL.aff to nl.aff (and use the latest files).

2) Replace 2.00g with 2.20.19 in the files desc_en_US.txt and desc_nl_NL.txt

3) Use version 2.20.19 on this line https://cgit.freedesktop.org/libreoffice/dictionaries/tree/nl_NL/description.xml#n5

4) Replace these two files license_en_EN.txt and licentie_nl_NL.txt with LICENSE.txt from the release

5) Replace README_NL.txt with README.md from the release

6) In https://cgit.freedesktop.org/libreoffice/dictionaries/tree/nl_NL/dictionaries.xcu remove all "_NL" (but not "-NL") (In this way nl-NL and nl-BE use the files *nl.*)

7) For the hyphenation file is no update available, but please rename it to hyph_nl.dic

8) Also update the file Dictionary_nl.mk

If you have a commit for this to review or have any questions, please contact me.
Comment 1 Julien Nabet 2021-01-03 17:10:22 UTC
Cor: thought you might be interested in this one.
Comment 2 Pander 2021-01-03 18:04:54 UTC
See also https://bugzilla.redhat.com/show_bug.cgi?id=1912135
Comment 3 Erik Quaeghebeur 2021-01-03 20:32:53 UTC
(In reply to Pander from comment #0)
> 1) Rename in https://cgit.freedesktop.org/libreoffice/dictionaries/tree/ the
> directory nl_NL to nl This was already fixed in the Debian packages a few
> years ago. The spelling for written Dutch is identical for the Netherlands,
> Belgium, Suriname, etc. hence only nl is used. Also rename nl_NL.dic to
> nl.dic and nl_NL.aff to nl.aff (and use the latest files).
Is Bug 64830 relevant in this context?

> 6) In
> https://cgit.freedesktop.org/libreoffice/dictionaries/tree/nl_NL/
> dictionaries.xcu remove all "_NL" (but not "-NL") (In this way nl-NL and
> nl-BE use the files *nl.*)
Perhaps also include -SR (Surinam), -AW (Aruba), -SX (Sint-Maarten), -BQ (Bonaire, Saint Eustatius, Saba), and -CW (Curaçao).

N.B.: -AN (Netherlands Antilles) was removed in 2010 <https://www.iso.org/files/live/sites/isoorg/files/archive/pdf/en/iso_3166-1_newsletter_vi-8_split_of_the_dutch_antilles_final-en.pdf>.
Comment 4 Erik Quaeghebeur 2021-01-03 20:48:13 UTC
There is a thesaurus available (separately?) in the past. Perhaps it can be included in this bundle, similar to how it is done for the Danish bundle <https://extensions.libreoffice.org/en/extensions/show/stavekontrolden-danish-dictionary>?
Comment 5 Pander 2021-01-04 13:28:31 UTC
Thanks for the symbolic links, I will give an update on this within a day or so. Have to check with the others.

Thesaurus and synonyms will be upgrades somewhere towards the end of 2021.
Comment 6 Pander 2021-01-04 22:53:51 UTC
Update for point 6), only support:
- nl-AW
- nl-BE
- nl-NL
- nl-SR

That is Aruba, Belgium, Netherlands and Suriname. The reason is that these are regions with their own locale (AW, BE, NL) or are top-level countries (SR).
Comment 7 Erik Quaeghebeur 2021-02-03 11:51:22 UTC
(In reply to Pander from comment #5)
> Thesaurus and synonyms will be upgrades somewhere towards the end of 2021.
Just like non-upgraded hyphenation files are used in this bundle, perhaps a non-upgraded thesaurus could be included. This would make it easier for downstream use (Gentoo Linux spelling files use LibreOffice bundles as a resource). Also, the thesaurus is not available anymore, so it would be lost if I upgrade Gentoo using this bundle. (Gentoo still has it cached separately for the previous bundle.)
Comment 8 Pander 2021-07-05 13:19:20 UTC
Patched/updated files are found at https://github.com/OpenTaal/opentaal-beta/tree/main/libreoffice/dictionaries

Let me know if you need more to upgrade the Dutch spelling checker.
Comment 9 Pander 2021-07-05 13:20:48 UTC
Non-upgraded thesaurus cannot be added at this moment. It contains some potentially offensive words or relations which we have to fix first.
Comment 10 Cor Nouws 2021-07-09 21:12:26 UTC
(In reply to Pander from comment #9)
> Non-upgraded thesaurus cannot be added at this moment. It contains some
> potentially offensive words or relations which we have to fix first.
Thanks for your good work. But serious "potentially offensive words ..."that could be many, depending on changing  cultural context, people's background, habits.. Of course it is your (teams) choice, but I would not take that too hard.
Cheers - Cor
Comment 11 Pander 2021-07-15 11:05:36 UTC
These specific relations in the thesaurus really need to be removed first. It are only a few, but they are also by us, the maintainers, undesirable and very offensive in almost any context for already over a few decades. This is not part of hyper politically correctness or censorship. The words are still in our word list and spelling checker, it only concerns a relation that should be removed. So, please go forward with updating the Dutch spelling checker. The previous version is almost a decade old. :)
Comment 12 Verhoeckx 2021-08-03 20:48:35 UTC
I wonder if the installing of the Dutch spelling checker on a English installation of a Linux distribution can be made easier.

Right now you have to know that you have to install the package hunspell-nl. But how should a normal user know this?

Is it possible that for OpenTaal to make a LibreOffice extension so that it can be installed with LibreOffice --> Tools --> Language --> More dictionaries online... ?

Shall I create a separate issue for this?
Who could I contact for this?
Comment 13 Pander 2021-08-03 20:52:50 UTC
We (OpenTaal) prefer that LibreOffice updates the existing support for Dutch spell checking.

For those who want faster access, please see https://github.com/OpenTaal/opentaal-beta
Comment 14 Verhoeckx 2021-08-18 11:57:17 UTC
(In reply to Pander from comment #13)
> We (OpenTaal) prefer that LibreOffice updates the existing support for Dutch
> spell checking.


What do you mean with this? What is the preferred way?
And is this way easy enough for the average user?



> For those who want faster access, please see
> https://github.com/OpenTaal/opentaal-beta

Oh, there already is an extension! I didn't know that. The reason why it can't be found on extensions.libreoffice.org is that it's still in beta?
Comment 15 Pander 2021-08-19 09:27:59 UTC
I think it is a 'core extension' and has been around since the beginning. I can create an entry at extenstions.libreoffive.org for our beta version.

Most important is that the files as described below and found in our beta github repo arrive at https://cgit.freedesktop.org/libreoffice/dictionaries/tree/nl_NL for the official core extension. Do you know who can help with that?
Comment 16 Julien Nabet 2021-08-19 09:56:23 UTC
Andras: thought you might be interested in this one since it concerns dictionary.
Comment 17 Verhoeckx 2021-08-24 13:39:48 UTC
(In reply to Pander from comment #15)
> I think it is a 'core extension' and has been around since the beginning. I
> can create an entry at extenstions.libreoffive.org for our beta version.
> 
> Most important is that the files as described below and found in our beta
> github repo arrive at
> https://cgit.freedesktop.org/libreoffice/dictionaries/tree/nl_NL for the
> official core extension. Do you know who can help with that?


The problem that I wanted to address is that at the moment it's hard to install the Dutch dictionary if you have installed the English version of LibreOffice (on a Linux distribution). There just is no easy way at the moment to do that.
Comment 18 Pander 2021-08-24 13:47:08 UTC
Why is that a problem exactly? Perhaps that should be another issue.

If you have the Linux package hunspell-nl it should work directly.

Also possible now is https://extensions.libreoffice.org/en/extensions/show/5711
Comment 19 Verhoeckx 2021-08-25 14:23:23 UTC
(In reply to Pander from comment #18)
> Why is that a problem exactly? Perhaps that should be another issue.
> 
> If you have the Linux package hunspell-nl it should work directly.> 


Exactly that's the problem: you have to know that you have to install the hunspell-nl package! A new user of a Linux distribution doesn't know that.

I do a fresh install of a Linux distribution every few years and every time I have to think about the name of the package. This year I forget the word 'hunspell' and I couldn't find it.

You are right, this problem may deserve its own issue.


> Also possible now is
> https://extensions.libreoffice.org/en/extensions/show/5711

Wow, that's great!! Now it becomes much easier to install the Dutch dictionary!!

Can you also add this to Dictionary extension (LibreOffice --> Review --> More dictionaries online)? That would solve the above problem!!

If you want, I can create a new separate issue?
Comment 20 Pander 2022-07-12 15:57:00 UTC
Can all files related to Dutch at https://cgit.freedesktop.org/libreoffice/dictionaries/tree/ get upgraded as described above with the versions found at https://extensions.libreoffice.org/en/extensions/show/5711

I have noticed that some distributions and projects use libreoffice dictionaries as their upstream. And, these files need upgrading for the spelling checker that is shipped by default with OO.

What is the best way to do this as this issue is open for more than one and a half year? Shall I make a patch?
Comment 21 Aron Budea 2022-08-29 02:25:01 UTC
https://gerrit.libreoffice.org/c/dictionaries/+/138954

Pander, I've updated the dictionary based on the current 2.20.21 beta, and your comments, please let me know if it's fine. You can review it on gerrit as well, for that you'll need to register an account first. Also, in the future you are welcome to submit a patch yourself, if you'd prefer that.
Comment 22 Aron Budea 2022-09-28 09:34:47 UTC
I'll reset assigned status for now to avoid that happening automatically, but the patch is there and active.