Bug 163657 - Some Thai misspelled words are listed in spell checking dictionary
Summary: Some Thai misspelled words are listed in spell checking dictionary
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
24.8.2.1 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:25.2.0 target:25.8.0
Keywords:
Depends on:
Blocks: Dictionaries
  Show dependency treegraph
 
Reported: 2024-10-28 10:44 UTC by Theppitak Karoonboonyanan
Modified: 2024-12-24 16:37 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Theppitak Karoonboonyanan 2024-10-28 10:44:51 UTC
While working on bug#163346, I find some Thai misspelled words are not caught by the spell checker because they are listed in the spell checking dictionary:

กังวาล
ปรากฎ
กาละแม
คฑา
จุมพฎ
ทราก
ทะแยง
ทิฏฐิ
พนิช
พริ้ว
พลูโตเนียม
วานิช
ศรีษะ
สฤษฎ์
สาราณียากร
หล่ะ
องคชาติ
อัตคัต
อานิสงค์
อิริยาบท

From the info in README_th_TH.txt, one of the sources is "libthai data", where word list for word segmentation is provided. For tolerance of common misspelled words, it includes "tdict-spell.txt" for some common spelling variations. But this list should not be included in the spell checker.

There are more words to be removed than what listed above, because some misspelled entries from tdict-spell.txt still get caught somehow due to word boundary analysis of LibreOffice itself. But all entries should be removed anyway.
Comment 1 V Stuart Foote 2024-10-28 13:06:24 UTC
(In reply to Theppitak Karoonboonyanan from comment #0)
> ...includes "tdict-spell.txt" for some common spelling variations. 
> But this list should not be included in the spell checker.

Seems reasonable, but is it feasible to patch remove the "tdict-spell.txt" content from its "libthai data" source? Or better to rebundle a new dictionary word list?

Anyhow filed by SME so => NEW
Comment 2 Theppitak Karoonboonyanan 2024-10-28 16:23:31 UTC
(In reply to V Stuart Foote from comment #1)
> Seems reasonable, but is it feasible to patch remove the "tdict-spell.txt"
> content from its "libthai data" source? Or better to rebundle a new
> dictionary word list?

This gerrit submission removes the "tdict-spell.txt" contents from the
dictionary:

https://gerrit.libreoffice.org/c/dictionaries/+/175732
Comment 3 Commit Notification 2024-10-29 06:24:08 UTC
Theppitak Karoonboonyanan committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/dictionaries/commit/d375e51d98ce78857dc71f5a26b75058592d833d

tdf#163657 Adjust Thai spell check dictionary
Comment 4 Theppitak Karoonboonyanan 2024-12-22 09:57:25 UTC
Yet another (big) set of removals/fixings submission:
https://gerrit.libreoffice.org/c/dictionaries/+/179109
Comment 5 Commit Notification 2024-12-24 16:37:08 UTC
Theppitak Karoonboonyanan committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/dictionaries/commit/614d35e5e05a8ac908446111d1cc71620248c288

tdf#163657 Remove/fix typos in Thai spelling dict