Created attachment 44750 [details]
I am suggesting the implementation of the Maltese dictionary of the Island of Malta, Europe. I am attaching the txt file and .dic file with what I have started. There are 2348 words already. It would be great since it will be the only office suite that has such a spell checker. UTF-8. is there an easy way to update them. Some words end with a ' will that effect the detection?
FWIW in fedora we've been shipping "hunspell-mt" since 2008 to provide Maltese spell-checking by running "wordlist2hunspell" over the Maltese word list of http://linux.org.mt/projects/spellcheck/
Have you spoken to Ramon about his wordlist and formally converting them to hunspell format ? The above list has 500,000+ words in it apparently.
Its not strictly-speaking necessary to have the dicts in LibreOffice itself as we support dictionary extensions, there should be a lot of examples around to follow.
As a small aside, in your mt_MT.dic the first line is supposed to be the count of lines in the .dic, so "2348 mt_MT.txt" should instead be "2348" (well, should remove the blank line under it and recalculate the number of lines)
re the ' in Maltese, LibreOffice will likely split word around ' and send each bit to the spell-checker separately. I think I spoke to Ramon about this once
Ah yes, here's what I have in my mail from 2009, which doesn't mean I'm right, but just a data point about handling the ' in words
> The Maltese language has words that end in a dash or an apostrophe,
> like [ jista' ] or [ bil- ]. I added those characters to hunspell's
> affix file as WORDCHARS and the command-line version of hunspell works
> fine. However OOo apparently does not use that setting, so if I have
> text containing the words listed above, OOo will spell-check them as
> jista and bil - without the aspostrophe and dash.
My understanding is that we use the icu word boundary iterator to split
up a sentence into words that we then give to the spell-checker.
The default rules are described at
so..., I think your problem is that these rules would allow e.g.
FOO'BAR and FOO-BAR but will split FOO' as FOO + '
I'm not altogether sure if the correct solution is to talk to the icu
people (http://site.icu-project.org/) about getting Maltese rules for
word boundary improved/fixed in icu. Or if the correct solution is to
make custom icu-rules and stick them into OOo to over-ride those
defaults for your language.
Seems Ramon's latest work was http://linux.org.mt/downloads/spellcheck/speller-11.zip which is now converted to hunspell format, but unfortunately there's no licence notice there on it
[This is an automated message.]
This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it
started right out as NEW without ever being explicitly confirmed. The bug is
changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back
to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases.
Details on how to test the 3.5.0 beta1 can be found at:
more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html
Dear bug submitter!
Due to the fact, that there are a lot of NEEDINFO bugs with no answer within the last six months, we close all of these bugs.
To keep this message short, more infos are available @ https://wiki.documentfoundation.org/QA/NeedinfoClosure#Statement
Thanks for understanding and hopefully updating your bug, so that everything is prepared for developers to fix your problem.