Bug 140186 - Alphabetical Index: Some diacritical variations of a letter are wrongly indexed under a separate delimiter, when language for index is English
Summary: Alphabetical Index: Some diacritical variations of a letter are wrongly index...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.6.0.4 release
Hardware: All All
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: TableofContents-Indexes
  Show dependency treegraph
 
Reported: 2021-02-05 12:29 UTC by R. Green
Modified: 2023-03-02 09:49 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Writer file showing indexing of diacriticals (11.99 KB, application/vnd.oasis.opendocument.text)
2021-02-20 11:30 UTC, R. Green
Details

Note You need to log in before you can comment on or make changes to this bug.
Description R. Green 2021-02-05 12:29:32 UTC
A serious limitation of alphabetical indexes is that there is no option to ignore diacritics. This means that words beginning with A, Ā, À, Á  etc. are indexed under DIFFERENT alphabetical delimiters! In fact, they should all be treated as starting with the letter "A", and placed under the same delimiter.

See also https://en.wikipedia.org/wiki/Diacritic#Alphabetization_or_collation (wikipedia).

(This issue was raised in Bug 131315, but, on reflection, needs to be treated as a separate issue to prevent it getting "lost in the post").
Comment 1 R. Green 2021-02-20 11:30:28 UTC
Created attachment 169913 [details]
Writer file showing indexing of diacriticals

The opening post was a bit misleading. Common diacriticals, such as acute and grave, seem to be indexed correctly. Other diacriticals may cause the word to be indexed under a new alphabetical heading.

This is shown in the example Writer file (attached).
Comment 3 R. Green 2021-12-17 11:21:13 UTC
Perhaps a tick/check box could be added to "Edit Index > Type" with the following wording: "Ignore diacritic at start of entry".

This would allow a diacritic letter at the start of an entry to be sequenced under under the same alphabetical delimiter as if the start letter were non-diacritcal.
Comment 4 R. Green 2022-11-14 12:50:32 UTC
Version: 7.1.5.2 / LibreOffice Community
Build ID: 85f04e9f809797b8199d13c421bd8a2b025d52b5
CPU threads: 2; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-GB (en_GB.UTF-8); UI: en-GB
Calc: threaded

So, to summarize the problem. In English-language indexes at least, common diacriticals, such as acute, grave, circumflex, ARE correctly indexed under the same alphabetical delimiter. Others, such as a macron (bar above letter) are not.

SUGGESTED BEHAVIOUR

Alphabetization needs to be corrected so that ALL diacritical variations of a letter are filed under the SAME ALPHABETICAL DELIMITER.
Comment 5 R. Green 2022-11-14 12:56:01 UTC
And should there be an option (in the edit index dialog) to ignore dicritics altogether when it comes to alphabetizing?
Comment 6 Buovjaga 2023-03-02 09:49:14 UTC
Reproduced by inserting a new alphabetical index in the document with English as the index language (as opposed to, say, Hebrew). Already seen in last36onmaster commit tag in linux-43all repo. The oldest commit crashes when opening the index dialog, so I can test.

Arch Linux 64-bit, X11
Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: e32dfaf15563372ffae6e0da53998e20068ebf81
CPU threads: 8; OS: Linux 6.2; UI render: default; VCL: kf5 (cairo+xcb)
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: threaded
Built on 1 March 2023