Bug 90578 - Word boundaries for spell checking broken for Finnish
Summary: Word boundaries for spell checking broken for Finnish
Status: RESOLVED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-12 09:05 UTC by Harri Pitkänen
Modified: 2015-04-13 19:45 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Harri Pitkänen 2015-04-12 09:05:48 UTC
In commit http://cgit.freedesktop.org/libreoffice/core/commit/?id=6e225b41f1ab3e6cac395b0c0c6db73414658625 and bug https://bugs.documentfoundation.org/show_bug.cgi?id=55707 changes were made to Finnish word counting by removing previous customisations to breakiterator rules. Unfortunately while these changes improved word counting they had negative effect on spell checking: we should use same word boundaries for spell checking and word counting but this is no longer the case.

For example the test case string "Kuorma-auto kaakkois- ja Keski-Suomi USA:n 90:n %:n" should count as 7 words for word counting (as it correctly does) but these same 7 words should also be used for spell checking. So the words for spell checking should be

Kuorma-auto (works correctly)
kaakkois- (now incorrectly just kaakkois)
ja (works correctly)
Keski-Suomi (works correctly)
USA:n (now incorrectly split at colon)
90:n (now incorrectly split at colon)
%:n (now incorrectly split at colon)

I have filed a change request to CLDR to have the hyphen error fixed there: http://unicode.org/cldr/trac/ticket/8368
Comment 1 Julien Nabet 2015-04-12 19:00:21 UTC
Caolan: thought you might be interested in this one.
Comment 2 Caolán McNamara 2015-04-13 15:03:13 UTC
I'd consider this more an "closed->upstream" sort of thing as we can't sustainably support customized icu rules, its too hard as icu/unicode evolves over time to keep them right.
Comment 3 Julien Nabet 2015-04-13 19:45:13 UTC
If it's mainstream, let's put this one to notourbug. Perhaps it'll be fixed in 5.5.1 see http://site.icu-project.org/ (since we use ICU 5.4.1)

Anyway, thank you Caolan for your feedback.