Bug 97393 - English Dictionaries update - 2016
Summary: English Dictionaries update - 2016
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
5.1.0.0.alpha0+ Master
Hardware: All All
: medium normal
Assignee: Not Assigned
QA Contact:
URL: http://extensions.libreoffice.org/ext...
Whiteboard: target:5.2.0 target:5.3.0
Keywords:
: 89032 (view as bug list)
Depends on:
Blocks: Dictionaries
  Show dependency treegraph
 
Reported: 2016-01-27 12:57 UTC by Marco A.G.Pinto
Modified: 2017-06-24 11:37 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marco A.G.Pinto 2016-01-27 12:57:37 UTC
Hello!

Could you bundle the OXT I am maintaining with the English Dictionaries in LO 5.1?

The URL:
http://extensions.libreoffice.org/extension-center/english-dictionaries/

Please wait until Friday since I am going to update GB + US + CA.

GB will have 1622 new words in the Friday release.

Thanks!

Kind regards,
Comment 1 Marco A.G.Pinto 2016-01-27 13:13:03 UTC
[13:09] <DennisRoczek> marcoagpinto: add an additional comment that 5.0.X should also be updated it's horrible outdated
Comment 2 Joel Madero 2016-01-27 19:00:43 UTC
We don't bundle dictionaries as far as I know (thus why we have the extensions). Dictionaries are independent of LibreOffice proper (for instance in Linux we can use myspell which is a separate dictionary package).

Marking as WONTFIX - if you have additional information please share and we'll go from there :)
Comment 3 Dennis Roczek 2016-01-28 09:51:13 UTC
@Joel, we do bundle dictionaries. EN, FR, DE, and PT or ES. Check your extension manager and let show all extensions.

See also bug 96782.
Comment 4 Joel Madero 2016-01-28 16:51:14 UTC
Thanks Dennis for the clarification :)
Comment 5 Marco A.G.Pinto 2016-01-29 15:30:52 UTC
Hello!

I have updated and uploaded the English Dictionaries:
http://extensions.libreoffice.org/extension-center/english-dictionaries/


MAGP 2016-02-01

Updated the Dictionaries:
- American (Kevin Atkinson)
- Canadian (Kevin Atkinson)
- British (Marco A.G.Pinto)*
  * British has 1622 new words.


Please notice that GB has 18'000+ new words since I took the task back in 2013.

Kind regards,
     >Marco A.G.Pinto
      ---------------
Comment 6 Aron Budea 2016-05-18 04:26:57 UTC
I was looking into pushing the dictionary update to master, and there's an important difference: the original dictionary also includes a Lightproof sentence checker.

Since merging the two is a bit of a pain, would it be possible to consolidate and maintain the package as one? Or is it better to do the merging once, and later only update the changed dictionary files? (the files to be merged are like: description.xml, dictionaries.xcu, manifest.xml)

I'm also wondering, does the current setup (old shipped dictionary with sentence check + current extension without sentence check) mean that once the extension is installed, the English sentence check becomes unavailable?
Comment 7 Aron Budea 2016-05-18 04:37:36 UTC
*** Bug 89032 has been marked as a duplicate of this bug. ***
Comment 8 Aron Budea 2016-05-19 07:25:20 UTC
I passed on bug 98098 (about the thesaurus) to the US/CA English dictionary maintainer, and learned that they have some limitations, as they were created for spell checking in a way described here:
https://github.com/kevina/wordlist/issues/154#issuecomment-220219093

I'm guessing the thesaurus expects the dictionary to find the stem it can base its suggestions on (guessing, as I've got no prior knowledge on the topic), but those dictionaries are built to produce an accepted list of words with a simpler set of affixes, and thus sometimes won't produce the correct stem.

How big of an issue is this? (I have no idea how many words and different forms are affected)
As a user I would find it a bit irritating to have some buggy suggestions, but probably wouldn't care too much, a spell checker that is kept up to date is more important in everyday use.
At the same time, it's important to know what can, and can't be expected from a dictionary like that.

Any thoughts on how seriously this should be taken?
If this raises no objections, I'd like to push the updated dictionaries so they're shipped with LO 5.2.
Comment 9 Aron Budea 2016-05-23 03:39:05 UTC
The dictionary update based on the 2016.05.01 release has been submitted to gerrit as https://gerrit.libreoffice.org/#/c/25348/
These were not updated:
 -all the configuration files (xml, xcu),
 -hyph_en_GB/US: they're a bit newer in LO,
 -en_ZA: too many mutual differences.

This commit in response to bug 61660 is not yet integrated, Marco, please take a look:
https://cgit.freedesktop.org/libreoffice/dictionaries/commit/?id=7e4239060266bf238b5e6692ed10d548c37572d5 

Since the extension has monthly updates (GB dictionary is updated monthly, the less are less frequently), if this works out, I'm planning to submit an update once more during 5.2 RC phase.
Comment 10 Marco A.G.Pinto 2016-05-23 12:07:35 UTC
@Aron Budea:

Hello!

I don't know how to check Gerrit.

About ZA I believe it is an important update to include since the current version is very outdated

I got in contact with its author a month or two ago (Dwayne Bailey) who told me that the version in Mozilla (Firefox & Thunderbird) were his latest release (2012-07-10). So I added this version to the OXT.

A few months ago I also got in contact with the NZ guy (Tristan Burtenshaw) but he said he would reply to me in the following week but he never did. Tristan had a GB thesaurus in his OXT (the current OXT uses US thesaurus for all English variants).
Comment 13 Adolfo Jayme 2016-07-23 09:13:04 UTC
Cherry-picked again for the final 5.2.0 release candidate as 258bf15aac7975e1202558b6d922be8a9a072b37
Comment 14 Marco A.G.Pinto 2016-08-27 14:39:49 UTC
My dear brother Áron and the team,

I have just uploaded the bimonthly update of the English Dictionaries:
http://extensions.libreoffice.org/extension-center/english-dictionaries/

Kevin's US and CA were committed by Áron the other month.

So, the new stuff in the OXT to be committed is the GB speller:

MAGP 2016-09-01

Updated the Dictionaries:
- American (Kevin Atkinson)
- Canadian (Kevin Atkinson)
- British (Marco A.G.Pinto)*
  * British has 773 new words (2016-08-01) + 728 new words (2016-09-01).
    GB changelog is no longer included in the README file,
    instead there are links inside it that point to the information
	(lower filesize).


GB has 773+728 new words, making a total of 1501 new words since it was last committed.

Áron and guys, could someone commit it?

Really, one of these days I am going to learn how to do it myself, but first I will learn it for the autocorrect pt_PT and only later for spellers.

Thanks!

Kind regards,
     >Marco A.G.Pinto
      ---------------
Comment 15 Aron Budea 2016-09-02 03:44:07 UTC
Let's target the time window before 5.3 (~November) or 5.3.1 (~December) branch off.