158454 – Add Thai Autocorrect Support

Bug 158454 - Add Thai Autocorrect Support

Summary: Add Thai Autocorrect Support

Status:	RESOLVED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Writer (show other bugs)
Version: (earliest affected)	24.2.0.0 alpha1+
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:	target:24.8.0 target:24.2.0.2 target:...
Keywords:

Depends on:
Blocks:	AutoCorrect-Complete
	Show dependency tree / graph

Reported:	2023-11-30 10:36 UTC by Theppitak Karoonboonyanan
Modified:	2024-12-10 16:18 UTC (History)
CC List:	5 users (show)

See Also:	163346
Crash report or crash signature:

Attachments
Patch to add Thai Autocorrect data (97.03 KB, patch) 2023-11-30 10:36 UTC, Theppitak Karoonboonyanan	Details
Patch to make Autocorrect match more than one rule (27.57 KB, patch) 2023-11-30 10:38 UTC, Theppitak Karoonboonyanan	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Theppitak Karoonboonyanan 2023-11-30 10:36:14 UTC

Created attachment 191134 [details]
Patch to add Thai Autocorrect data

I would like to add Autocorrect data for Thai in which common misspelled words are corrected.

As Thai script has no word delimiter, the matching patterns will be with both left and right wildcards so that words are matched at any position in a text chunk and fixed. For example, ".*กงศุล.*" -> "กงสุล" will fix a text chunk "สถานกงศุลใหญ่" to "สถานกงสุลใหญ่".

This, however, may require additional adjustment to the current matching behavior to make it more complete. The current implementation stops immediately when the first pattern is matched. This means only one replacement will take place even though there can be more than one typos in the text chunk.

For example, suppose there are only 2 rules in the Autocorrect rule set:
  - ".*กงศุล.*" -> "กงสุล"
  - ".*อนุญาติ.*" -> "อนุญาต"
and the input text chunk is composed of 2 typos:
  "ขออนุญาติจากสถานกงศุลใหญ่".
Assuming that the rules are matched in order, only the first rule will be matched in current implementation, and the text chunk becomes:
  "ขออนุญาติจากสถานกงสุลใหญ่"
although the desired result is:
  "ขออนุญาตจากสถานกงสุลใหญ่"
where both typos are fixed.

So, I'm proposing 2 patches, one for the data, and the other for the code.

Comment 1 Theppitak Karoonboonyanan 2023-11-30 10:38:58 UTC

Created attachment 191135 [details]
Patch to make Autocorrect match more than one rule

Comment 2 Theppitak Karoonboonyanan 2023-12-01 10:37:48 UTC

Gerrit commits to be reviewed:

- Add Thai AutoCorrect data
  https://gerrit.libreoffice.org/c/core/+/160159

- SvxAutoCorrDoc::ChgAutoCorrWord() implementations: correct multiple patterns
  https://gerrit.libreoffice.org/c/core/+/160160

Comment 3 Hossein 2024-04-30 09:52:37 UTC

Hello Jonathan,
I thought you may have better insight on this issue, and possibly review the submitted patch.

Comment 4 Commit Notification 2024-06-21 18:34:39 UTC

Theppitak Karoonboonyanan committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/76c96ca7c9a6e0d847ec5dc186c6e47ab6061f5f

tdf#158454 Add Thai Autocorrect Support, coding part

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 5 Jonathan Clark 2024-06-27 13:06:55 UTC

Hello Theppitak Karoonboonyanan,

I believe the outstanding work for this bug has been completed. If you agree, please mark this bug fixed.

Thanks!

Comment 6 Theppitak Karoonboonyanan 2024-10-08 08:41:27 UTC

(In reply to Jonathan Clark from comment #5)
> I believe the outstanding work for this bug has been completed. If you
> agree, please mark this bug fixed.

Yes. Thanks for the reminder! I'm closing it.

Comment 7 Shantanu 2024-10-09 04:53:24 UTC

There are certain words in the Marathi language where multiple autocorrect rules can be applied. The patch is functioning exceptionally well. Thank you for this outstanding contribution.

Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 0fb23379e63071ec155cb6683c19212859e399b5
CPU threads: 1; OS: Windows 10 X86_64 (10.0 build 14393); UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

Comment 8 Theppitak Karoonboonyanan 2024-10-10 16:09:49 UTC

(In reply to Shantanu from comment #7)
> There are certain words in the Marathi language where multiple autocorrect
> rules can be applied. The patch is functioning exceptionally well. Thank you
> for this outstanding contribution.

Thank you for the feedback. I'm pleased to learn that.

Comment 9 Shantanu 2024-10-25 03:59:52 UTC

For users of an older version of LibreOffice who wish to apply more than one autocorrect rule, simply use Tools - Autocorrect - Apply two or three times consecutively. Most users apply autocorrect only once, as they are likely unaware of this issue. 
However, while typing, only a single rule will be applied because, as noted in the initial post, "The current implementation stops immediately when the first pattern is matched." I appreciate this important enhancement to the autocorrect functionality.