Bug 116666 - Fix Hungarian collation
Summary: Fix Hungarian collation
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Localization (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: László Németh
URL:
Whiteboard: target:6.1.0
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-27 19:19 UTC by László Németh
Modified: 2018-04-03 07:14 UTC (History)
0 users

See Also:
Crash report or crash signature:


Attachments
Test file for the suggested solution (19.39 KB, application/vnd.oasis.opendocument.text)
2018-03-27 19:23 UTC, László Németh
Details

Note You need to log in before you can comment on or make changes to this bug.
Description László Németh 2018-03-27 19:19:06 UTC
Hungarian orthography rules contain the following extra requirements for sorting words and sentences:

– expand simplified double consonants;

– ignore spaces and hyphens;

– prefer lower case homonyms.

(Source: http://helyesírás.mta.hu/helyesiras/default/akh12#F2_4)

Expansion of double consonants, (eg. sort “ccs” (long “cs”) as “cscs”) is still not perfect, but in my analysis, it reduces the bad sorting positions by a factor of 1/5, than ordering without explansion (3843 vs. 19425 in 4 million word forms).

More important advantage, using full expansion it's possible to automatize Hungarian sorting with manual (or in future, Hunspell based) preprocessing. (Unfortunatelly, ICU collation algorithm alone is not enough for Hungarian, yet.) Inserting soft hyphens is a quick workaround for here, too (as for the similar problem of the single consonants, eg. “igazság” -> igaz­ság (igaz[U+AD]ság) sorted before “igaztalan” correctly).
Comment 1 László Németh 2018-03-27 19:23:06 UTC
Created attachment 140923 [details]
Test file for the suggested solution

Red text color signs bad, green signs fixed sorting order.
Comment 2 László Németh 2018-03-27 21:12:04 UTC
Test:

Select cells in the test text document, and choose Tools->Sort...

Note on case ordering: previously LibreOffice didn't sort the same words of different casing at all.
Comment 3 Commit Notification 2018-03-29 14:46:03 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=7b1eb6313c0d2621c364df1724c69d28f8267841

tdf#116666 fix Hungarian sorting

It will be available in 6.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 4 Commit Notification 2018-03-31 19:13:39 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=34ae19b1e9ede8bdcf56e393f68a7f875e32a068

tdf#116666 Hungarian collation: casing and equality fixes

It will be available in 6.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.