Bug 158454 - Add Thai Autocorrect Support
Summary: Add Thai Autocorrect Support
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
24.2.0.0 alpha1+
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard: target:24.8.0 target:24.2.0.2 target:...
Keywords:
Depends on:
Blocks: AutoCorrect-Complete
  Show dependency treegraph
 
Reported: 2023-11-30 10:36 UTC by Theppitak Karoonboonyanan
Modified: 2024-10-25 03:59 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Patch to add Thai Autocorrect data (97.03 KB, patch)
2023-11-30 10:36 UTC, Theppitak Karoonboonyanan
Details
Patch to make Autocorrect match more than one rule (27.57 KB, patch)
2023-11-30 10:38 UTC, Theppitak Karoonboonyanan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Theppitak Karoonboonyanan 2023-11-30 10:36:14 UTC
Created attachment 191134 [details]
Patch to add Thai Autocorrect data

I would like to add Autocorrect data for Thai in which common misspelled words are corrected.

As Thai script has no word delimiter, the matching patterns will be with both left and right wildcards so that words are matched at any position in a text chunk and fixed. For example, ".*กงศุล.*" -> "กงสุล" will fix a text chunk "สถานกงศุลใหญ่" to "สถานกงสุลใหญ่".

This, however, may require additional adjustment to the current matching behavior to make it more complete. The current implementation stops immediately when the first pattern is matched. This means only one replacement will take place even though there can be more than one typos in the text chunk.

For example, suppose there are only 2 rules in the Autocorrect rule set:
  - ".*กงศุล.*" -> "กงสุล"
  - ".*อนุญาติ.*" -> "อนุญาต"
and the input text chunk is composed of 2 typos:
  "ขออนุญาติจากสถานกงศุลใหญ่".
Assuming that the rules are matched in order, only the first rule will be matched in current implementation, and the text chunk becomes:
  "ขออนุญาติจากสถานกงสุลใหญ่"
although the desired result is:
  "ขออนุญาตจากสถานกงสุลใหญ่"
where both typos are fixed.

So, I'm proposing 2 patches, one for the data, and the other for the code.
Comment 1 Theppitak Karoonboonyanan 2023-11-30 10:38:58 UTC
Created attachment 191135 [details]
Patch to make Autocorrect match more than one rule
Comment 2 Theppitak Karoonboonyanan 2023-12-01 10:37:48 UTC
Gerrit commits to be reviewed:

- Add Thai AutoCorrect data
  https://gerrit.libreoffice.org/c/core/+/160159

- SvxAutoCorrDoc::ChgAutoCorrWord() implementations: correct multiple patterns
  https://gerrit.libreoffice.org/c/core/+/160160
Comment 3 Hossein 2024-04-30 09:52:37 UTC
Hello Jonathan,
I thought you may have better insight on this issue, and possibly review the submitted patch.
Comment 4 Commit Notification 2024-06-21 18:34:39 UTC
Theppitak Karoonboonyanan committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/76c96ca7c9a6e0d847ec5dc186c6e47ab6061f5f

tdf#158454 Add Thai Autocorrect Support, coding part

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 5 Jonathan Clark 2024-06-27 13:06:55 UTC
Hello Theppitak Karoonboonyanan,

I believe the outstanding work for this bug has been completed. If you agree, please mark this bug fixed.

Thanks!
Comment 6 Theppitak Karoonboonyanan 2024-10-08 08:41:27 UTC
(In reply to Jonathan Clark from comment #5)
> I believe the outstanding work for this bug has been completed. If you
> agree, please mark this bug fixed.

Yes. Thanks for the reminder! I'm closing it.
Comment 7 Shantanu 2024-10-09 04:53:24 UTC
There are certain words in the Marathi language where multiple autocorrect rules can be applied. The patch is functioning exceptionally well. Thank you for this outstanding contribution.

Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 0fb23379e63071ec155cb6683c19212859e399b5
CPU threads: 1; OS: Windows 10 X86_64 (10.0 build 14393); UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded
Comment 8 Theppitak Karoonboonyanan 2024-10-10 16:09:49 UTC
(In reply to Shantanu from comment #7)
> There are certain words in the Marathi language where multiple autocorrect
> rules can be applied. The patch is functioning exceptionally well. Thank you
> for this outstanding contribution.

Thank you for the feedback. I'm pleased to learn that.
Comment 9 Shantanu 2024-10-25 03:59:52 UTC
For users of an older version of LibreOffice who wish to apply more than one autocorrect rule, simply use Tools - Autocorrect - Apply two or three times consecutively. Most users apply autocorrect only once, as they are likely unaware of this issue. 
However, while typing, only a single rule will be applied because, as noted in the initial post, "The current implementation stops immediately when the first pattern is matched." I appreciate this important enhancement to the autocorrect functionality.