90578 – Word boundaries for spell checking broken for Finnish

Bug 90578 - Word boundaries for spell checking broken for Finnish

Summary: Word boundaries for spell checking broken for Finnish

Status:	RESOLVED NOTOURBUG

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Writer (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-04-12 09:05 UTC by Harri Pitkänen
Modified:	2015-04-13 19:45 UTC (History)
CC List:	2 users (show)

See Also:
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Harri Pitkänen 2015-04-12 09:05:48 UTC

In commit http://cgit.freedesktop.org/libreoffice/core/commit/?id=6e225b41f1ab3e6cac395b0c0c6db73414658625 and bug https://bugs.documentfoundation.org/show_bug.cgi?id=55707 changes were made to Finnish word counting by removing previous customisations to breakiterator rules. Unfortunately while these changes improved word counting they had negative effect on spell checking: we should use same word boundaries for spell checking and word counting but this is no longer the case.

For example the test case string "Kuorma-auto kaakkois- ja Keski-Suomi USA:n 90:n %:n" should count as 7 words for word counting (as it correctly does) but these same 7 words should also be used for spell checking. So the words for spell checking should be

Kuorma-auto (works correctly)
kaakkois- (now incorrectly just kaakkois)
ja (works correctly)
Keski-Suomi (works correctly)
USA:n (now incorrectly split at colon)
90:n (now incorrectly split at colon)
%:n (now incorrectly split at colon)

I have filed a change request to CLDR to have the hyphen error fixed there: http://unicode.org/cldr/trac/ticket/8368

Comment 1 Julien Nabet 2015-04-12 19:00:21 UTC

Caolan: thought you might be interested in this one.

Comment 2 Caolán McNamara 2015-04-13 15:03:13 UTC

I'd consider this more an "closed->upstream" sort of thing as we can't sustainably support customized icu rules, its too hard as icu/unicode evolves over time to keep them right.

Comment 3 Julien Nabet 2015-04-13 19:45:13 UTC

If it's mainstream, let's put this one to notourbug. Perhaps it'll be fixed in 5.5.1 see http://site.icu-project.org/ (since we use ICU 5.4.1)

Anyway, thank you Caolan for your feedback.