Bug 64830 - RTL: LibreOffice requires duplicated hunspell dictionaries for each Arabic locale
Summary: RTL: LibreOffice requires duplicated hunspell dictionaries for each Arabic lo...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: ⁨خالد حسني⁩
URL:
Whiteboard: BSA target:7.6.0
Keywords:
Depends on:
Blocks: Spell-Checking Arabic-and-Farsi
  Show dependency treegraph
 
Reported: 2013-05-21 16:23 UTC by Munzir Taha
Modified: 2023-06-10 18:51 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Munzir Taha 2013-05-21 16:23:23 UTC
Problem description: 
Having a list of Arabic ($country) is redundant. Unlike en_US and en_GB, the spell checking of the Arabic language doesn't depend on the country.

Steps to reproduce:
1. Tools - Options - Language Settings - Languages


Current behavior:
There is a list of: Arabic (Algeria), Arabic (Egypt), Arabic (Lebanon), Arabic (Oman), Arabic (Saudi Arabia), Arabic (Tunisia)

Expected behavior:
To have only one option: Arabic
              
Operating System: Linux (Other)
Version: 4.0.3.3 release
Comment 1 ⁨خالد حسني⁩ 2013-05-25 13:07:56 UTC
AFAIK, the spell dictionaries was duplicated per country because OpenOffice didn’t look for “ab” dictionary if it didn’t find “ab_CD” one. So we need to check is this still the case and fix that (but I’m might be tricky, IIRC for some locales such behaviour would be undesired). There is probably a bug for this somewhere in the OpenOffice issue tracker.
Comment 2 ⁨خالد حسني⁩ 2014-01-01 14:57:12 UTC
Eike, any pointers about the linked OOo issue, any chance that your language tags work help here?
Comment 3 ⁨خالد حسني⁩ 2014-01-01 19:53:03 UTC
I thought, naïvely, that may be I can try to use MsLangId::getPrimaryLanguage() as fallback.

First, SpellChecker::hasLocale() in lingucomponent/source/spellcheck/spell/sspellimp.cxx is never called (not by Writer at least), so whatever I do there does not matter.

What gets called, however, is SpellCheckerDispatcher::hasLocale() in linguistic/source/spelldsp.cxx (which took me a while to find going through all that convoluted UNO stuff), but passing the primary language to aSvcMap.find() there does not find anything, apparently the spell checker for "ar" is never registered or something, but I gave up at this point since the code on that file is too clever for me to grasp.
Comment 4 Joel Madero 2015-05-02 15:42:33 UTC Comment hidden (noise)
Comment 5 ⁨خالد حسني⁩ 2015-05-02 16:24:10 UTC
Still an issue.
Comment 6 QA Administrators 2016-09-20 09:38:19 UTC Comment hidden (noise)
Comment 7 Munzir Taha 2016-09-20 18:37:14 UTC
Still an issue
Comment 8 Erik Quaeghebeur 2017-02-19 19:40:01 UTC
This same issue affects many other languages for which putting their dictionaries into the ‘locale’ mold creates extra work, for example, French and Dutch, both languages that are mostly ‘standardized’ worldwide qua spelling. So perhaps make the bug title broader.
Comment 9 ⁨خالد حسني⁩ 2017-02-19 20:34:04 UTC
Code pointers from the mailing list https://lists.freedesktop.org/archives/libreoffice/2017-February/076982.html (mainly so that I do not forget where to find them).
Comment 10 QA Administrators 2018-10-13 03:14:26 UTC Comment hidden (noise)
Comment 11 Erik Quaeghebeur 2018-10-13 10:35:39 UTC
(In reply to QA Administrators from comment #10)
> If the bug is present, please leave a comment that includes the information
> from Help - About LibreOffice.
Still present in 6.0.6.2.
Comment 12 QA Administrators 2019-10-14 02:27:24 UTC Comment hidden (noise)
Comment 13 Erik Quaeghebeur 2019-10-14 15:23:18 UTC
(In reply to QA Administrators from comment #12)
> If the bug is present, please leave a comment that includes the information
> from Help - About LibreOffice.

Still present in 6.2.5.2.
Comment 14 Erik Quaeghebeur 2021-01-02 12:43:45 UTC
Still present in 6.4.7.2

This bug really complicates how downstreams (such as Linux distributions) need to deal with spelling/thesaurus/hyphenation files.

Also, again a request to change the title of this bug to something that better covers the issue. For example, “Accept spellcheck files without region”.

Furthermore, I guess LO should just try to respect bcp47 <https://tools.ietf.org/html/bcp47>, <https://en.wikipedia.org/wiki/IETF_language_tag>
Comment 15 Munzir Taha 2021-01-02 13:54:43 UTC
@Erik:
It's correct that "Accept spellcheck files without region" is part of my issue as Mr. Khaled pointed out. But I am hesitant to change the title because after the fallback mechanism we also need to remove the duplicates. Unlike English which you might want to keep en_US, en_GB and fallback to en, we shouldn't keep any regions files for Arabic spellchecking.
Comment 16 Erik Quaeghebeur 2021-01-02 14:37:47 UTC
(In reply to Munzir Taha from comment #15)
> @Erik:
> It's correct that "Accept spellcheck files without region" is part of my
> issue as Mr. Khaled pointed out. But I am hesitant to change the title
> because after the fallback mechanism we also need to remove the duplicates.
> Unlike English which you might want to keep en_US, en_GB and fallback to en,
> we shouldn't keep any regions files for Arabic spellchecking.
As Khaled Hosny indicated in Comment 1, the only reason there are country variants for Arabic now is because LibreOffice isn't able to deal with codes without a country designation. The same holds for Dutch (and I guess French).

However, if someone wants to create a separate regional variant spellcheck bundle, e.g., nl_BE for Dutch in Belgium (even if Dutch is standardized for all countries where Dutch is spoken), such would still need to be accepted by LibreOffice, as we shouldn't impose restrictions.

In any case, I think it is up to the developers to make the call whether to change the title or not.
Comment 17 Stéphane Guillou (stragu) 2022-12-12 16:46:28 UTC
No need to rename this bug report, the more generic bug 83561 already exists. Added in "See also".
Comment 18 Eyal Rozenberg 2022-12-15 18:30:33 UTC
So, having gone through the comments, there's something I don't understand.

Why does the locale even define the choice of spellchecking dictionaries, at all? 

First, and in principle - all dictionaries should be active/in-use, with the language/language-variant of a stretch of text deciding which dictionaries apply to which part of the text. See also bug 148257. And in this idealized world, if some text is in, say, Arabic (Iraq), then it would get both the general Arabic spell-checking dictionary used for it, and the specific Iraqi dictionary. On the other hand, once you support multi-level dictionaries, you also need to support "anti-entries", i.e. a word might be valid in Arabic generally but a local variant would forbid it. I'm guessing we don't have that?

Anyway, continuing with my ideal-world example: The user would want/need the ability to tweak the defaults. That would mean a list with checkboxes, and a drop-down list box above it for choosing which language / language-variant you want to tweak dictionary choice for. There you could decide you want to disable or enable some dictionaries (and there are also special dictionaries like "technical" which may be relevant to multiple languages).

And in all of the above, the choice of Locale and of UI language have absolutely no bearing on anything.
Comment 19 ⁨خالد حسني⁩ 2023-06-07 13:35:29 UTC
I’m repurposing this issue for fixing the need to duplicate hunspell Arabic dictionaries. For other languages, please open bug reports with specific details.

The fact that many languages require having many sub locales is not something we can easily change. In some situations the country code is irrelevant (e.g. spell checking in the case of Arabic) and in some situations it is relevant (currency, number formatting, etc). We have only one kind of language setting, that is used for spell checking, hyphenation, and many other situations, so we can’t simply merge all locales of a given language together because it will fix some issues but break others. There is also compatibility issue with formats that don’t offer country-less locales.

There is no general way to handle this (what works for  a specific locale does not necessarily work for others), so it needs to be spec’d in full detail before any code is written.
Comment 20 Commit Notification 2023-06-07 15:57:16 UTC
Khaled Hosny committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/2940fb7d2aba063441e7ce70bb276bfe912ed73e

tdf#64830: Don’t require hunspell dictionary for every Arabic locale

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Munzir Taha 2023-06-08 19:42:56 UTC
Thanks so much Dr. Khalid for the fix. Please, note that according to
https://unicode.org/cldr/charts/44/summary/ar.html
you might have missed ar-SS.

Though I am not sure what qualifies a country to have an Arabic locale when the Arabic language is not an official one, but I just don't want it to show on the list.
Comment 22 ⁨خالد حسني⁩ 2023-06-10 14:29:38 UTC
(In reply to Munzir Taha from comment #21)
> Thanks so much Dr. Khalid for the fix. Please, note that according to
> https://unicode.org/cldr/charts/44/summary/ar.html
> you might have missed ar-SS.
> 
> Though I am not sure what qualifies a country to have an Arabic locale when
> the Arabic language is not an official one, but I just don't want it to show
> on the list.

Thanks, we don’t seem to have any support for this locale, but I’ll added it just in case. Also note that this “fix” does not hide any of the Arabic locales, it just makes it unnessary to have ar-SA.dic, ar-EG.dic etc. hunspell dictionaries, if ar.dic is present it will be used for all the listed Arabic locales. It is strictly a bug fix.

Hiding the Arabic locales in some contexts at least, or being able to set the language to Arabic without a country code is a bigger issue that needs deeper changes and have compatibility considerations, so please open a new issue to track this if it is nit covered by one of the issues in the See Also field.
Comment 23 Commit Notification 2023-06-10 15:20:24 UTC
Khaled Hosny committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/376aded127cd5c9030c3b52fd2095c4241abc053

tdf#64830: Don’t require hunspell dictionary for every Arabic locale

It will be available in 7.5.5.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 24 Commit Notification 2023-06-10 18:50:58 UTC
خالد حسني committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/e3c88bfcd1b22dd7eef367c688f004cea0d5222e

Revert "tdf#64830: Don’t require hunspell dictionary for every Arabic locale"

It will be available in 7.5.5.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.