Bug 106480 - i18n: add language
Summary: i18n: add language
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Localization (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Eike Rathke
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-10 15:51 UTC by mrwlists
Modified: 2017-03-15 13:52 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description mrwlists 2017-03-10 15:51:48 UTC
Please could a new locale be added to support the use of the OED spelling dictionary.

I believe that two locale identifiers/tags may be required: "en-GB-oed" ("grandfathered tag"), and "en-GB-oxendict" ("language variant tag").

I have no knowledge of what goes into building "locale data", but the conventions are exactly the same as normal British English, i.e. en-GB.

I have understood that adding a locale is required before LibreOffice's spell checker will recognize an OED dictionary, hence this request. Bug 100462 refers.

The installable dictionary I use is this one:

en_gb-oed.oxt, found at https://sourceforge.net/projects/aoo-extensions/files/1881/4/

I see that something similar may have been done to support the use of Valencian.
Comment 1 Xisco Faulí 2017-03-10 18:40:32 UTC
Eike, can you comment on this one ?
Comment 2 Eike Rathke 2017-03-13 15:50:23 UTC
en-GB-oed is already present, from the language list it is available as "English, OED spelling (UK)" for spell-checking purposes. Locale data is not needed for spell-checking.

Note however that en_gb-oed.oxt is not directly suitable because it registers under the en-GB locale instead of en-GB-oed (OOo and AOO can't handle BCP47 language tags). Before installation you'd need to unzip it, edit dictionaries.xcu and for the Locales property replace <value>en-GB</value> with <value>en-GB-oed</value>. You may also add a second <value>en-GB-oxendict</value> to make that dictionary work with the newer tag (which currently can't be selected from the UI language list but would be recognized if a document used it).
Comment 3 Eike Rathke 2017-03-13 15:57:39 UTC
And then of course freshen the .oxt zip archive with the modified file before installing.. forgot to mention the obvious.
Comment 4 mrwlists 2017-03-13 16:41:26 UTC
Eike,

Thank you very much. This appears to work as you suggested. (LO 4.4, 5.1, 5.3). I wish I had asked before.

I have actually attempted this approach in the past, but I had edited the *installed* version of dictionaries.xcu file, in place, on the assumption that LO would refresh its knowledge of installed dictionaries when restarted.

Clearly that is not the case. LO seems to require that the "oxt" file be modified before installation to be effective. And I was not aware that an 'oxt' file is in reality a 'zip' file. That knowledge helps.

How should the two locales be separated ? With a simple space, or perhaps a comma ? I see it is of type "oor:string-list", but I haven't identified precisely what that means.

Would it be possible to add the suitably modified version of the dictionary into the English Dictionaries extension download site ?

https://extensions.libreoffice.org/extensions/english-dictionaries

Most people won't want it, but some will find it useful, particularly those who need to write documents that require these spelling conventions.
Comment 5 Eike Rathke 2017-03-14 11:43:34 UTC
(In reply to mrwlists from comment #4)
> I have actually attempted this approach in the past, but I had edited the
> *installed* version of dictionaries.xcu file, in place, on the assumption
> that LO would refresh its knowledge of installed dictionaries when restarted.
Ah no, the available languages are registered in the configuration during install, the dictionaries.xcu files then serve only as a mapping which dictionary implements what language.

> How should the two locales be separated ? With a simple space, or perhaps a
> comma ? I see it is of type "oor:string-list", but I haven't identified
> precisely what that means.
A simple space, so this
<value>en-GB-oxendict en-GB-oed</value>
does it.

> Would it be possible to add the suitably modified version of the dictionary
> into the English Dictionaries extension download site ?
> 
> https://extensions.libreoffice.org/extensions/english-dictionaries
That bundle doesn't handle the OED spelling at all. You could ask the author of course if he wanted to do it.. but maybe it's better to have a separate dictionary.

One possibility would be to take the en_gb-oed.oxt from https://sourceforge.net/projects/aoo-extensions/files/1881/4/ and, as apparently it is under LGPL license, modify the language tags and upload (maybe as en-gb-oxendict.oxt) to the extensions.libreoffice.org site. What I dislike a bit is that it doesn't come with a proper license file stating the exact version of the license (dictionary authors seem to be a tad sloppy to that regard ;-) but given its age and that of the original aspell/myspell dictionary probably LGPLv2. Maybe ask the original author (stated in the README_en_GB-oed.txt file) if he'd like to change things and publish on extensions.libreoffice.org
Comment 6 mrwlists 2017-03-14 14:45:19 UTC
Eike,

Thank you for your suggestions, they feel right and I shall follow up.

Looking a little more deeply at the other writing aids:

I can see that it would probably be good if the existing (en-GB et al) grammar checker was to encompass these language tags as well. I haven't yet located this project's 'home', is it 'official LO' ?

The English dictionaries bundle includes a (small) hyphenation dictionary and a (large) thesaurus. Both are suitable for en-GB-oed/oxendict.

My initial thought is to appropriately 'borrow' the hyphenation dictionary and incorporate it into the new en-gb-oxendict.oxt, but ask the maintainer to simply add the language tags to the thesaurus. Does this sound a sensible approach ? It could get confusing, and one would wish to avoid the possibility of conflict. I don't know how LO handles multiple dictionaries/thesauri that claim the same language tag.

I had half expected that these tools would automagically be made available to 'en-GB-oed' text, on the basis of tag hierarchy, but that appears not be the case.
Comment 7 Eike Rathke 2017-03-15 13:52:10 UTC
(In reply to mrwlists from comment #6)
> I can see that it would probably be good if the existing (en-GB et al)
> grammar checker was to encompass these language tags as well. I haven't yet
> located this project's 'home', is it 'official LO' ?
Actually there are three different sources of dictionaries:
* the system wide dictionaries, for example under /usr/share/myspell/
  (or hunspell dictionaries), which are used by Linux distributions
* the bundled dictionaries, of which (for the English dictionaries) the
  source is in the git submodule dictionaries under dictionaries/en,
  i.e. https://cgit.freedesktop.org/libreoffice/dictionaries/tree/en
  which mainly are used with releases built for Windows unless one rolls
  his own build that includes them
* extension dictionaries, like
  https://extensions.libreoffice.org/extensions/english-dictionaries for
  which the maintainer is Marco A.G.Pinto, see the homepage link there

Only the git dictionaries submodule is 'official LO' in that they are
maintained and shipped. Extension dictionaries are maintained (or not)
by whoever contributes them.

> The English dictionaries bundle includes a (small) hyphenation dictionary
> and a (large) thesaurus. Both are suitable for en-GB-oed/oxendict.
But isn't the difference between en-GB and en-GB-oxendict that spelling
differs? I'm not an English native speaker. How can the thesaurus serve
both spellings?

> My initial thought is to appropriately 'borrow' the hyphenation dictionary
> and incorporate it into the new en-gb-oxendict.oxt, but ask the maintainer
> to simply add the language tags to the thesaurus. Does this sound a sensible
> approach ? It could get confusing, and one would wish to avoid the
> possibility of conflict. I don't know how LO handles multiple
> dictionaries/thesauri that claim the same language tag.
One wins.. so that isn't really suitable.
IMHO, either a dedicated en-GB-oxendict bundle (even if that resulted in
some duplicated hyphenation data and/or thesaurus), or get Marco to add
the en-GB-oxendict/oed dictionary would be better.

> I had half expected that these tools would automagically be made available
> to 'en-GB-oed' text, on the basis of tag hierarchy, but that appears not be
> the case.
Further work on the code may be needed to make a fallback chain
applicable to such cases. Apart from that, if a text is attributed
en-GB-oxendict I think it would not be correct to silently use en-GB if
en-GB-oxendict is not available.