Bug 94744 - Different alphabetical index for the same Greek letter depending on accentuation
Summary: Different alphabetical index for the same Greek letter depending on accentuation
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.4.5.2 release
Hardware: x86-64 (AMD64) All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: TableofContents-Indexes
  Show dependency treegraph
 
Reported: 2015-10-03 23:48 UTC by thanasis57
Modified: 2019-01-26 13:38 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Example of a Greek alphabetical index (18.06 KB, application/vnd.oasis.opendocument.text)
2015-10-04 01:16 UTC, thanasis57
Details

Note You need to log in before you can comment on or make changes to this bug.
Description thanasis57 2015-10-03 23:48:56 UTC
Alphabetical indexes for Greek entries are built in the same way that they are built for English, where vowels are not accentuated.

In modern Greek, however, a word can start, e.g. with non-accentuated vowel, such as "ε" (like "επιτελής") or an accentuated such as "έ" (like "έντονος"). Whereas Greek alphabetisation rules consider those epsilons identical, Writer separates them into two different entries for accentuated and non-accentuated. The same applies for all other Greek vowels (α, ε, η, ι, ο, υ, ω).

My suggestion is that Writer should group these entries of the character tables and treat them as identical when it comes to alphabetisation.

In the next post I will give the character tables that should be equivalent.
Comment 1 thanasis57 2015-10-04 00:20:33 UTC
And here are the characters that should be alphabetised in an equivalent manner, along with their unicode numbers. Some comments on the post right after this one.

Α
Α U+0391
Ά U+0386 (equivalent to U+1FBB)
Ἀ U+1F08
Ἁ U+1F09
Ἄ U+1F0C
Ἅ U+1F0D
Ἆ U+1F0E
Ἇ U+1F0F

Ε
Ε U+0395
Έ U+0388 (equivalent to U+1FC9)
Ἐ U+1F18
Ἑ U+1F19
Ἔ U+1F1C
Ἕ U+1F1D

Η
Η U+0397
Ή U+0389 (equivalent to U+1FCB)
Ἠ U+1F28
Ἡ U+1F29
Ἤ U+1F2C
Ἥ U+1F2D
Ἦ U+1F2E
Ἧ U+1F2F

Ι
Ι U+0399
Ί U+038Α (equivalent to U+1FDB)
Ἰ U+1F38
Ἱ U+1F39
Ἴ U+1F3C
Ἵ U+1F3D
Ἶ U+1F3E
Ἷ U+1F3F

Ο
Ο U+039F
Ό U+038C (equivalent to U+1FF9)
Ὀ U+1F48
Ὁ U+1F49
Ὄ U+1F4C
Ὅ U+1F4D

Υ
Υ U+03A5
Ύ U+038E (equivalent to U+1FEB)
Ὑ U+1F59
Ὕ U+1F5D
Ὗ U+1F5F

Ω
Ω U+03A9
Ώ U+038F (equivalent to U+1FFB)
Ὠ U+1F68
Ὡ U+1F69
Ὤ U+1F6C
Ὥ U+1F6D
Ὦ U+1F6E
Ὧ U+1F6F
Comment 2 thanasis57 2015-10-04 00:37:27 UTC
Now, you will observe that these characters fall into two groups far apart from each other, the U+03xx and the U+1Fxx, with some equivalences in the cases with the oxeia accent.

<rant>
That is because at the time the unicode character tables were being created, some bright minds at ELOT (the Greek ISO), decided that the polytonic characters are not Greek, but ancient Greek! It was right after the time of the linguistic reform, where the accents vareia and perispomeni, and the breathes psili and dassia were dropped, and we passed from the polytonic to the monotonic system.

And since they represented the Greek state, their recommendation must have carried weight with the Unicode consortium. If only they knew...

Later, however, some linguists and philologists (probably non-Greek) must have said: "Hey, wouldn't it be great to be able to use a computer for our work, which involves things like... ancient Greek?". And so we got polytonic support in Unicode.
</rant>

You will notice the following:
-Since the oxeia accent was the only one retained after the 80's, there are equivalences between the monotonic and polytonic character sets, in particular where the oxeia is concerned.

-I omit the vareia variants, because the vareia only goes at the last syllable (if it is accentuated), which is indifferent for the current purposes (although, I suppose the programmatic cost to implement this shouldn't be important, but just saying).

-I only mention the capital letters for brevity's sake.
Comment 3 thanasis57 2015-10-04 01:00:43 UTC
And I forgot rho (Ρ, ρ), which despite being a consonant can take a vareia breathe:

Ρ
Ρ U+03A1
Ῥ U+1FEC
Comment 4 thanasis57 2015-10-04 01:16:36 UTC
Created attachment 119253 [details]
Example of a Greek alphabetical index

A separate entry is created in the index for every variant of a Greek letter.
Comment 5 thanasis57 2015-10-05 01:33:26 UTC
See also some additional info here: http://www.polytoniko.org/erga.php?newlang=en
Comment 6 Buovjaga 2015-10-08 08:29:21 UTC
So this is basically an enhancement. With the confidence of my complete ignorance, I'll set to NEW.