Bug 164050 - Spell Check: add option to ignore/filter combining characters (U+0300 and above)
Summary: Spell Check: add option to ignore/filter combining characters (U+0300 and above)
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
24.8.3.2 release
Hardware: x86-64 (AMD64) Windows (All)
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Authors
  Show dependency treegraph
 
Reported: 2024-11-25 23:38 UTC by Marko
Modified: 2025-11-25 15:51 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marko 2024-11-25 23:38:36 UTC
Main idea is that writing phonetic text by using "combining characters" to put phonetic accents on characters is braking spell check because special characters are added to correctly spelled words.

Suggestion is to add option to filter/ignore combining characters in spell checking preferences.

This is a problem especially for cyrillic script because unicode does not have (for example) vowels with acute like it does for latin script. That is why one must use combining characters (for example U+0300 Combinig grave accent, U+0301 Combining acute accent, ...) to accentuate characters used mainly for pronunciation purposes.

Example (Russian):
accentuated    non-accentuated
пожа́луйста     пожалуйста
спаси́бо        спасибо

Example (Serbian):
accentuated    non-accentuated
м̋олим ва́с     молим вас
хв̂ала ва́м     хвала вам

(copy this text to Writer and set the appropriate language with automatic spelling turned on; note: you must install bundled extensions for spell check for appropriate language)

Another example (in Serbian) using accents to distinguish homonyms:
Ја са́м с̂ам. - I am alone.
Comment 1 László Németh 2025-11-10 16:14:18 UTC
Adding the next line to the  aff file of the Hunspell dictionary, it's possible to ignore all diacritics during spell checking:


IGNORE ̀́̃

(i.e. IGNORE U+0300 U+0301 U+0302)