Bug 107769 - spell checking should normalize data first
Summary: spell checking should normalize data first
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
5.4.0.0.alpha1+
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Spell-Checking
  Show dependency treegraph
 
Reported: 2017-05-11 11:05 UTC by martin_hosken
Modified: 2024-08-26 15:57 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description martin_hosken 2017-05-11 11:05:45 UTC
Words to be spell checked should be converted to NFKC first so that spell checking dictionaries don't need to hold all forms (NFD, NFC, mixed) of a word.

I'm going to sketch my thoughts on how to do it here in case I can't get back to the bug for a while. Anyone want to take it further?

In SpellChecker::GetSpellFailure in lingucomponent/source/spell/sspellimpl.cxx, rather than doing a poor man's hand created NFK into nWord, start with an nWord created something like:

icu::UnicodeString rIn(reinterpret_case<const UChar *>(rWord.getStr()), rWord.getLength());
icu::UnicodeString normal;
UErrorCode rCode;
icu::Normalizer(rIn, UNORM_NFKC, normal, rCode);
OUString nWord(U_SUCCESS(rCode) ? OUString(reinterpret_case<Sal_Unicode *>(normal.getBuffer()), normal.length()) : OUString());

then use nWord instead of rWord for the rest of the function.

Need to find a test for this.
Comment 1 Buovjaga 2017-05-12 17:49:58 UTC
Ok -> NEW