Bug 164715 - Improve spelling suggestion for Thai
Summary: Improve spelling suggestion for Thai
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
24.8.4.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Spell-Checking Dictionaries
  Show dependency treegraph
 
Reported: 2025-01-15 04:13 UTC by Theppitak Karoonboonyanan
Modified: 2025-01-16 04:25 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
List of common typos (42.08 KB, text/plain)
2025-01-15 04:13 UTC, Theppitak Karoonboonyanan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Theppitak Karoonboonyanan 2025-01-15 04:13:34 UTC
Created attachment 198547 [details]
List of common typos

The current Thai spell checking is of poor quality, mainly caused by the dictionary-based word segmentation before checking, which works not quite well with misspelled words, and the incomplete word fragments passed to the spell checker is thus insufficient information by nature.

However, assuming that the word boundaries were not the problem, there would still be certain classes of typos whose correct spellings are not suggested. And I will focus on this improvement in this bug. Let's discuss on the word boundary problem in another bug.

Some examples of such typos whose correct spellings are not suggested:
- กระเฌอ กะเชอ กะเฌอ (correct: กระเชอ)
- กะเลวกะลาด (correct: กเฬวราก)
- กอร์ป (correct: กอปร)
- กาชาติ (correct: กาชาด)
- การะบูน (correct: การบูร)
- กุมภกัณฐ์ (correct: กุมภกรรณ)
- เกษา (correct: เกศา)
- เกาท์ (correct: เกาต์)
- ขบฏ (correct: ขบถ)
- คะมักคะเม่น (correct: ขะมักเขม้น)
- ข้าวโภช (correct: ข้าวโพด)
- ขี้เฒ่า ขี้เท่า (correct: ขี้เถ้า)
- คันลอง คัลลอง (correct: ครรลอง)
And many others.

I have created a list of common typos with the words/phrases separated by spaces for testing using 'hunspell -d th_TH <file>' command line. The first word of each line is the correct spelling, and the rests are typos.

Expected result: the first word should be included in the suggestion list.

Actual result: some entries fail to suggest, some are OK.
Comment 1 Shantanu 2025-01-15 11:00:21 UTC
Did you try ph tag in your .dic file? It will always suggest the correct word. For e.g.

กาชาด ph: กาชาติ
การบูร ph: การะบูน
กุมภกรรณ ph: กุมภกัณฐ์
Comment 2 Theppitak Karoonboonyanan 2025-01-15 16:58:00 UTC
(In reply to Shantanu from comment #1)
> Did you try ph tag in your .dic file? It will always suggest the correct
> word. For e.g.
> 
> กาชาด ph: กาชาติ
> การบูร ph: การะบูน
> กุมภกรรณ ph: กุมภกัณฐ์

Yes, I've tried it in some previous patch, and will address these cases with it. Thanks for mentioning it. And any other suggestions are welcome.
Comment 3 Theppitak Karoonboonyanan 2025-01-16 04:25:02 UTC
Proposed patch in gerrit:
https://gerrit.libreoffice.org/c/dictionaries/+/180311