Bug 33779 - Spell check not available for minority languages even when spell check extension successfully installed
Summary: Spell check not available for minority languages even when spell check extens...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
3.3.0 release
Hardware: All All
: medium major
Assignee: Andras Timar
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-31 12:37 UTC by Jeremy Brown
Modified: 2011-04-08 04:27 UTC (History)
0 users

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Brown 2011-01-31 12:37:57 UTC
When setting the language for a block of text or the whole document, apparently the list of languages is limited to a hard-coded set of languages. However, this excludes a large number of the 6,900 languages currently spoken on earth (see www.ethnologue.org for a list).

I've tried installing a Hunspell spell check dictionary for Teke-Ibali (ISO 639-3 code "tek"). The spell check extension installed correctly, but I cannot set the text's language to Teke-Ibali and thus I cannot spell check with the Teke-Ibali spell check dictionary.

Info on the ISO 639-3 standard can be found here: http://www.sil.org/iso639-3/default.asp

I suppose one way to solve this would be to hardcode all the languages in the world:
http://www.sil.org/iso639-3/iso-639-3_20100707.tab

But I think a nice option would be if the software could check all installed dictionary extensions, and if the language for a spell check dictionary is not in the hard-coded language list, add it to the language list for the user to select. Maybe that would require adding a little extra info to the spell check dictionary's description file, such as the display name. Maybe language selection governs other things right now like number and date formatting. If that is the case, I would suggest allowing the dictionary file for a non-hardcoded language to specify a hardcoded language that should be used for number and date formatting. e.g., for Teke-Ibali, which is spoken in a country where French is the national language, number and date formatting could follow the "fr" language settings, while spell checking would use the "tek" spell check dictionary.

Another acceptable option would be to have an easy-to-create (e.g. just XML like the dictionary extensions) *language* extension that could specify the language code, language display name, and major language that the minority language should emulate for number and date formatting purposes. Any language extensions that were installed would result in that language showing up in the language list for setting the language of a text, and that would allow spell check dictionaries for that language to be selected and used.
Comment 1 Don't use this account, use tml@iki.fi 2011-01-31 13:08:37 UTC
I am fairly sure this isn't something that has been removed by intent compared to earlier versions of OOo, so if that extension has worked in those (why else would it have been made), I wonder how?
Comment 2 Don't use this account, use tml@iki.fi 2011-01-31 13:20:41 UTC
Ah, sorry, you said you installed a hunspell dictionary, not an OOo extension to provide the dictionary for that language. In that case, yeah, I guess that language needs to be added to a hardcoded list. Note that the list can't easily be made totally dynamic, as far as I know, because we for instance need the name of each document language supported translated to every UI language supported, so we need to have an a priori list of them. (But yeah, some day in the future perhaps those translated language names, too, can be fetched dynamically from some database; isn't such information part of what the CLDR people are working on?)
Comment 3 Jeremy Brown 2011-01-31 14:38:33 UTC
(In reply to comment #2)
> Ah, sorry, you said you installed a hunspell dictionary, not an OOo extension
> to provide the dictionary for that language. In that case, yeah, I guess that
> language needs to be added to a hardcoded list. Note that the list can't easily
> be made totally dynamic, as far as I know, because we for instance need the
> name of each document language supported translated to every UI language
> supported, so we need to have an a priori list of them. (But yeah, some day in
> the future perhaps those translated language names, too, can be fetched
> dynamically from some database; isn't such information part of what the CLDR
> people are working on?)

So what is the process for requesting languages to be added to that list? I have several for the country I'm working in (Republic of Congo) that I'd like to  request right now. These are languages that have print dictionaries either published or in progress that we want to turn into spell check dictionaries, to help writers have an easier time producing literature in their traditional language.

I'll list the ISO 639-3 code followed by the langauge name I'm requesting be added to the LibreOffice language list:

beq Beembe
bkw Bekwel
mkw Kituba
ldi Lari
mdw Mbochi
tyx Teke-Tyee
vif Vili

Thanks
Comment 4 Don't use this account, use tml@iki.fi 2011-01-31 14:51:40 UTC
This bug report should be enough to start the process; it will eventaully be assigned to some of our l10n people, I am sure, once the person on bug triage duty gets to it. (I just couldn't resist jumping in commenting...)
Comment 5 Andras Timar 2011-02-01 05:20:48 UTC
Started.
Comment 6 Caolán McNamara 2011-02-01 05:26:46 UTC
FWIW see http://wiki.services.openoffice.org/wiki/Adding_a_new_language_or_locale for "how to" for the bare minimum of what's needed to add support to spell-check a language, and the full-blown locale data to make it a proper fully-supported locale, and bug 30773 for the example of Kabyle
Comment 7 Andras Timar 2011-04-08 04:27:31 UTC
I added these languages to the language selector list, so users can set the language of the text and use their hunspell dictionaries (in LibreOffice 3.4 and higher). I think creating locale files is out of scope of this bug. 

Jeremy, if you would like to have full locale support in these languages, including number formats, translation of days of the week, month names etc., then please create locale files with http://www.it46.se/localegen/ and file a new bug. Please note, that this is not required for the spell checker.