Bug 158454 - Add Thai Autocorrect Support
Summary: Add Thai Autocorrect Support
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
24.2.0.0 alpha1+
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard: target:24.8.0 target:24.2.0.2
Keywords:
Depends on:
Blocks: AutoCorrect-Complete
  Show dependency treegraph
 
Reported: 2023-11-30 10:36 UTC by Theppitak Karoonboonyanan
Modified: 2024-04-30 09:52 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Patch to add Thai Autocorrect data (97.03 KB, patch)
2023-11-30 10:36 UTC, Theppitak Karoonboonyanan
Details
Patch to make Autocorrect match more than one rule (27.57 KB, patch)
2023-11-30 10:38 UTC, Theppitak Karoonboonyanan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Theppitak Karoonboonyanan 2023-11-30 10:36:14 UTC
Created attachment 191134 [details]
Patch to add Thai Autocorrect data

I would like to add Autocorrect data for Thai in which common misspelled words are corrected.

As Thai script has no word delimiter, the matching patterns will be with both left and right wildcards so that words are matched at any position in a text chunk and fixed. For example, ".*กงศุล.*" -> "กงสุล" will fix a text chunk "สถานกงศุลใหญ่" to "สถานกงสุลใหญ่".

This, however, may require additional adjustment to the current matching behavior to make it more complete. The current implementation stops immediately when the first pattern is matched. This means only one replacement will take place even though there can be more than one typos in the text chunk.

For example, suppose there are only 2 rules in the Autocorrect rule set:
  - ".*กงศุล.*" -> "กงสุล"
  - ".*อนุญาติ.*" -> "อนุญาต"
and the input text chunk is composed of 2 typos:
  "ขออนุญาติจากสถานกงศุลใหญ่".
Assuming that the rules are matched in order, only the first rule will be matched in current implementation, and the text chunk becomes:
  "ขออนุญาติจากสถานกงสุลใหญ่"
although the desired result is:
  "ขออนุญาตจากสถานกงสุลใหญ่"
where both typos are fixed.

So, I'm proposing 2 patches, one for the data, and the other for the code.
Comment 1 Theppitak Karoonboonyanan 2023-11-30 10:38:58 UTC
Created attachment 191135 [details]
Patch to make Autocorrect match more than one rule
Comment 2 Theppitak Karoonboonyanan 2023-12-01 10:37:48 UTC
Gerrit commits to be reviewed:

- Add Thai AutoCorrect data
  https://gerrit.libreoffice.org/c/core/+/160159

- SvxAutoCorrDoc::ChgAutoCorrWord() implementations: correct multiple patterns
  https://gerrit.libreoffice.org/c/core/+/160160
Comment 3 Hossein 2024-04-30 09:52:37 UTC
Hello Jonathan,
I thought you may have better insight on this issue, and possibly review the submitted patch.