Bug 91285 - Words added to dictionary with nonbreaking space character in between are not recognized by spell-checker
Summary: Words added to dictionary with nonbreaking space character in between are not...
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: Other All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsDevEval
Depends on:
Blocks: RTL-CTL Spell-Checking
  Show dependency treegraph
 
Reported: 2015-05-14 12:41 UTC by irancplusplus
Modified: 2016-12-10 18:50 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description irancplusplus 2015-05-14 12:41:01 UTC
In Arabic-like languages, e.g. Persian, while substantially needed, there is not any space character that is regarded part of a word, so that, the word can be added as a single word to the spell-checking dictionary.

For example, there is no way to add the word "سازمان دهنده" as a single word to the dictionary. Instead either I have to add two words (
"سازمان" and "دهنده") which is inappropriate or use nowidth breaking character "سازمان‌دهنده" which forces the user to write some words in an ulgy formation and sometimes, for some words, it is not acceptable.

The nonbreaking space character is not regarded as part of a word so "سازمان دهنده" (now written with nonbreaking space character) is not regarded as a single word and cannot be added as a single word to the spell-checking  dictionary.

I think there must be an option in which we can choose which characters to be regarded as part of a single word.
Comment 1 Jean-Baptiste Faure 2015-05-14 14:41:20 UTC
Please, do not set your own bug reports to NEW. Each one must be confirmed independently.
Set status back to UNCONFIRMED.

Best regards. JBF
Comment 2 Yousuf Philips (jay) (retired) 2015-05-20 13:29:03 UTC
To my knowledge, there are no single Arabic words that are two words separated by a space.

@Timar: Is it possible to have multiple words in an dictionary entry?
Comment 3 tommy27 2016-12-09 07:02:29 UTC
I have no idea how the arabic grammar works, anyway I can confirm that spellchecker doesn't recognize as correct dictionary entries which contain a space in between.

a simple test any user could do is to write:
"asdf ghjk" (both words will show the red squiggly underline)

then click:
"tools/options/language settings/writing aids/user-defined dictionaries/edit"

and add "asdf ghjk" as a single entry.

"asdf ghjk" will be treated by spell checker as error (red squiggly lines persist in tge document)

instead if you add separately "asdf" and "ghjk" to the dictionary the spell checker will now recognize those words as legitimate entries.

I tested this with LibO 5.2.3.3 and OOo 3.3.0 under Win8.1 x64 and I can tell that this has always been like that.

Dictionary entries with a space in between are still treated as errors by the spellchecker.

Let's see what Andras Timar thinks about it and let's hear from Khaled Hosny some insight about arabic grammar (irancplusplus and Yousuf Philips said different things about this issue).

so, enhancement request, inherited from OOo, status NEW, needDevEval.
I also edited summary notes to make it clearer what this issue is about
Comment 4 Khaled Hosny (inactive) 2016-12-10 02:54:46 UTC
(In reply to tommy27 from comment #3)
> Let's see what Andras Timar thinks about it and let's hear from Khaled Hosny
> some insight about arabic grammar (irancplusplus and Yousuf Philips said
> different things about this issue).

Please keep in mind that irancplusplus is talking about Persian language not Arabic, so Arabic rules are irrelevant here.

That being said, NBSP is a word separator, so LibreOffice is right in considering this string two words. For reference, Unicode word breaking rules are documented here http://www.unicode.org/reports/tr29/#Word_Boundaries, the only space that might work here is U+202F NARROW NO-BREAK SPACE.
Comment 5 tommy27 2016-12-10 07:41:33 UTC
thanks Khaled.
so do you think this has to be labelled as WONTFIX or NOTABUG?
Comment 6 Khaled Hosny (inactive) 2016-12-10 14:05:46 UTC
(In reply to tommy27 from comment #5)
> thanks Khaled.
> so do you think this has to be labelled as WONTFIX or NOTABUG?

I’d say NOTABUG, spell checking (at least in LibreOffice) works on individual words, the OP might be looking for grammar checking.
Comment 7 tommy27 2016-12-10 17:52:54 UTC
ok. set status accordingly.