164050 – Spell Check: add option to ignore/filter combining characters (U+0300 and above)

Bug 164050 - Spell Check: add option to ignore/filter combining characters (U+0300 and above)

Summary: Spell Check: add option to ignore/filter combining characters (U+0300 and above)

Status:	NEW

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Linguistic (show other bugs)
Version: (earliest affected)	24.8.3.2 release
Hardware:	x86-64 (AMD64) Windows (All)

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	Authors
	Show dependency tree / graph

Reported:	2024-11-25 23:38 UTC by Marko
Modified:	2025-11-25 15:51 UTC (History)
CC List:	2 users (show)

See Also:
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Marko 2024-11-25 23:38:36 UTC

Main idea is that writing phonetic text by using "combining characters" to put phonetic accents on characters is braking spell check because special characters are added to correctly spelled words.

Suggestion is to add option to filter/ignore combining characters in spell checking preferences.

This is a problem especially for cyrillic script because unicode does not have (for example) vowels with acute like it does for latin script. That is why one must use combining characters (for example U+0300 Combinig grave accent, U+0301 Combining acute accent, ...) to accentuate characters used mainly for pronunciation purposes.

Example (Russian):
accentuated    non-accentuated
пожа́луйста     пожалуйста
спаси́бо        спасибо

Example (Serbian):
accentuated    non-accentuated
м̋олим ва́с     молим вас
хв̂ала ва́м     хвала вам

(copy this text to Writer and set the appropriate language with automatic spelling turned on; note: you must install bundled extensions for spell check for appropriate language)

Another example (in Serbian) using accents to distinguish homonyms:
Ја са́м с̂ам. - I am alone.

Comment 1 László Németh 2025-11-10 16:14:18 UTC

Adding the next line to the  aff file of the Hunspell dictionary, it's possible to ignore all diacritics during spell checking:


IGNORE ̀́̃

(i.e. IGNORE U+0300 U+0301 U+0302)