Bug 155315 - Remove ambiguous Romanian autocorrect entries
Summary: Remove ambiguous Romanian autocorrect entries
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
7.5.2.2 release
Hardware: All All
: medium normal
Assignee: BogdanB
URL:
Whiteboard: target:26.2.0
Keywords:
Depends on:
Blocks: AutoCorrect-Complete
  Show dependency treegraph
 
Reported: 2023-05-15 10:03 UTC by cipricus
Modified: 2025-10-28 14:39 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description cipricus 2023-05-15 10:03:02 UTC
Description:
Some forms that need correcting can be corrected to different correct forms. Auto-correction may lead to a form that is not the one desired and needs correction. 



Steps to Reproduce:
In Romanian: type a wrong form (e.g., "razbunata", "rasfatata", "neastamparata" etc)


Actual Results:
These are automatically corrected to proper forms ("răzbunată"=revenged, fem., "răsfățată"=spoiled/pampered, fem., "neastâmpărată"=naughty/unruly, fem.), but which are not the only ones possible (notably, the definite forms may also be expected in these examples: "răzbunată"=the revenged one, fem., "răsfățată"=the spoiled/pampered one, fem., "neastâmpărată"= the naughty/unruly, fem.).

Expected Results:
Auto-correction should provide a word unambiguosly expected relatively to the incorrect form to which the automated action is applied. 


Reproducible: Always


User Profile Reset: No

Additional Info:
I have noticed this while trying to fix the bug on auto-correction being applied to correct Romanian words (https://bugs.documentfoundation.org/show_bug.cgi?id=155087).

I am only aware of this problem with the Romanian auto-correction, but I would like to know if this could be identified as a rule: forms that might support multiple correct forms should not be auto-corrected.

If this is true, I could apply some adjustments to the Romanian auto-correction while I work for the linked bug report.
 
As said here (https://bugs.documentfoundation.org/show_bug.cgi?id=155087#c21):

`The autocorrection tool for any language must be prepared to require the least possible effort from user: the replacements that the tool makes must be correct on 100% cases`. 

That is not the case if supplementary actions may be required from the user. Auto-correction should not operate when ulterior intervention is not excluded.
Comment 1 cipricus 2023-05-15 10:13:51 UTC
(In reply to cipricus from comment #0)

> ...but which are not the only ones possible (notably, the definite forms
> may also be expected in these examples: "răzbunată"=the revenged one, fem.,
> "răsfățată"=the spoiled/pampered one, fem., "neastâmpărată"= the
> naughty/unruly, fem.).

I made a copy/paste error. The above should read:

the definite forms may also be expected (are also correct): "răzbunata"=the revenged one, fem.,"răsfățata"=the spoiled/pampered one, fem., "neastâmpărat"= the
naughty/unruly, fem.

That is, the definite form (ending in `a`) could be expected too, instead of the definite one (ending in `ă`). This structure may trigger this problem with Romanian, but it's not the only possible pattern, while other languages may have their own favorable patterns leading to the same problem. I haven't studied other language auto-correctors and am mentioning Romanian because it's here that I could intervene.

The main aspect here is whether a rule like the one aforementioned could be specified: forms that might support multiple correct forms should not be auto-corrected.
Comment 2 cipricus 2023-05-15 10:30:09 UTC
e.g. "tacuta" means nothing and should be corrected, but "tăcută"="silent", fem. and "tăcuta"="the silent one" are both correct)

For English that would be something like auto-correcting "bleack" to "black" or "bleak", where either (the other one) may be expected.
Comment 3 Stéphane Guillou (stragu) 2023-05-30 13:04:44 UTC
Thank you for the report.
So isn't the solution to remove the entries that are ambiguous from the corresponding DocumentList.xml, so the erroneous form then falls back onto the spellcheck? I assume autocorrect relies exclusively on unambiguous 1-to-1 rules, and a DocumentList.xml can't contain several replacements for the same string.

Maybe this report needs to be renamed to "Remove ambiguous Romanian autocorrect entries" so it is more focused and has a chance to be resolved.
Are you planning to work on it?
Comment 4 cipricus 2023-06-02 15:05:28 UTC
(In reply to Stéphane Guillou (stragu) from comment #3)
> Thank you for the report.
> So isn't the solution to remove the entries that are ambiguous from the
> corresponding DocumentList.xml, so the erroneous form then falls back onto
> the spellcheck? I assume autocorrect relies exclusively on unambiguous
> 1-to-1 rules, and a DocumentList.xml can't contain several replacements for
> the same string.
> 
> Maybe this report needs to be renamed to "Remove ambiguous Romanian
> autocorrect entries" so it is more focused and has a chance to be resolved.
> Are you planning to work on it?

Yes, I would like to work on it, although I don't know how systematically I can do it, but I would like to be able to propose changes when I notice the need at https://gerrit.libreoffice.org/c/core/+/151770

Is that ok?
Comment 5 cipricus 2023-06-02 15:07:55 UTC
(In reply to Stéphane Guillou (stragu) from comment #3)

> Maybe this report needs to be renamed to "Remove ambiguous Romanian
> autocorrect entries" so it is more focused and has a chance to be resolved.

I have renamed it.
Comment 6 cipricus 2023-06-02 15:16:27 UTC
(In reply to cipricus from comment #4)

> Yes, I would like to work on it, although I don't know how systematically I
> can do it, but I would like to be able to propose changes when I notice the
> need at https://gerrit.libreoffice.org/c/core/+/151770
> 
> Is that ok?

In fact I understand now from another exchange (https://ask.libreoffice.org/t/where-and-how-to-report-errors-in-defaults-of-autocorrection/91034/18?u=cipricus) that once the merge is made changes cannot be made at that address and a new session of changes has to be initiated. Thanks.
Comment 7 QA Administrators 2025-06-02 03:10:21 UTC Comment hidden (obsolete)
Comment 8 Xisco Faulí 2025-07-10 12:01:20 UTC
@Bogdan, I thought you might be interested in this issue
Comment 9 BogdanB 2025-07-11 11:35:23 UTC
cipricus, I started to work on this, for the beginning I started with letter A, B and C. It is ok what I did, or not? To not spent time and to do something wrong...

https://gerrit.libreoffice.org/c/core/+/187665
Comment 10 BogdanB 2025-07-16 16:11:03 UTC
Cipricus, please take a look for 5 minutes on my patch. If it is ok until now, I will continue, if not I will give up.
Comment 11 Commit Notification 2025-10-23 18:19:07 UTC
Bogdan Buzea committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/8ca67cbef05e584882e72b06b188dd07c0407fd7

tdf#155315 Remove ambiguous Romanian autocorrect entries

It will be available in 26.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Commit Notification 2025-10-27 14:14:37 UTC
Bogdan Buzea committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/54a8a80e3cde758930e6e3b46e9b8fe53c10eeae

tdf#155315 Remove ambiguous Romanian autocorrect entries (part II)

It will be available in 26.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 13 Commit Notification 2025-10-28 13:36:26 UTC
Bogdan Buzea committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/aad5f899a08fc8d852496b6fff5d307c1d80c511

tdf#155315 Remove ambiguous Romanian autocorrect entries (part III)

It will be available in 26.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 BogdanB 2025-10-28 14:39:42 UTC
All 3 commits were already published. Anything else can be open in a new bug, with certain words that still need to be removed.