Bug 96354 - autocorrect code incorrectly categorizes hebrew letters / 100% cpu usage when typing hebrew
Summary: autocorrect code incorrectly categorizes hebrew letters / 100% cpu usage when...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:5.4.0 target:5.3.1 target:5.2.6
Keywords:
Depends on:
Blocks: RTL-CTL AutoCorrect-Complete
  Show dependency treegraph
 
Reported: 2015-12-09 11:59 UTC by yossi
Modified: 2017-02-04 13:57 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description yossi 2015-12-09 11:59:02 UTC
SYMPTOM:
when auto-correct option "correct TWo INitial CApitals" is enabled, typing a space after a misspelled Hebrew word (or any word that does not exist in the Hebrew dictionary) causes a 2 second 100% cpu spike on my Intel i5 machine. disabling the above mentioned auto-correct option resolves the problem.

EXPLANATION:
after some stepping through a debugger and sifting through source code I think I have found two bugs related to this. Both bugs are in the file "\editeng\source\misc\svxacorr.cxx", and should be very easy to correct. (I am not familiar with the procedure of submitting patches).

1) When a space character is inserted it triggers the auto-correct logic which looks for two initial capitals. When it detects that, it checks the dictionary to see if the two initial capitals are intended. This is done in the function SvxAutoCorrect::FnCptlSttWrd using the following very expensive code:

                    if (!xSpeller->spell(sWord, eLang, aEmptySeq).is())
                    {
                        return false;
                    }

Instead of asking for suggestions and checking for a null list to detect correct spelling, it should be just be validating the spelling in the same way as the spell checking code does.

2) The auto-correct code detects the case of a letter using the functions IsUpperLetter() and IsLowerLetter() which both contain a bug which causes any non latin character to be recognized as both uppercase and lowercase when in fact it is neither, i.e. it deduces that a character is lowercase if it is not uppercase and vice versa. This results in the incorrect conclusion that any word written in a non latin script has two initial uppercase letters followed by lowercase letters and thus triggering the first bug.
Comment 1 Maxim Monastirsky 2015-12-09 13:14:56 UTC
Hi,

(In reply to a6487102 from comment #0)
> I am not familiar with the procedure of submitting patches
We accept patches through gerrit - https://gerrit.libreoffice.org/. Once you registered on gerrit, and set your local git repo to push to it, you can simply use the "git push" command (or better the "logerrit" helper script) to upload your patches there for review.

More information is at:

https://wiki.documentfoundation.org/Development/gerrit#Setting_yourself_up_for_gerrit_-_the_easy_way

Feel free to ask if you need more assistance with this.
Comment 2 Yousuf Philips (jay) (retired) 2016-10-17 21:17:48 UTC
Hi a6487102,

Were you able to get yourself setup with gerrit, so you can push your patch? Please let us know if you need any more assistance.

@Khaled: Any thoughts on the user's patch concept?
Comment 3 ⁨خالد حسني⁩ 2016-10-17 22:53:23 UTC
I don’t know much about the spell checking code, but for IsUpperLetter() and IsLowerLetter() that sounds broken indeed. May be instead of the home grown function, the code should use IsUpper() and IsLower() from <linguistic/misc.hxx> which seem not to suffer from the same issue.
Comment 4 Commit Notification 2016-12-05 08:21:48 UTC
Yossi Zahn committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=c6db0e064959ec154db9471a8e1f1e2716bb3523

tdf#96354 correct broken autocorrect INitial CApitals

It will be available in 5.4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 5 Xisco Faulí 2016-12-11 17:18:39 UTC
Moving it to NEW as there's already a commit for this issue
Comment 6 Xisco Faulí 2017-01-13 12:28:39 UTC
Hello,
Is this bug fixed?
If so, could you please close it as RESOLVED FIXED?
Comment 7 Commit Notification 2017-01-19 12:53:27 UTC
Yossi Zahn committed a patch related to this issue.
It has been pushed to "libreoffice-5-3":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=216047eeba5cf854be2d767708178db624b3a2da&h=libreoffice-5-3

tdf#96354 correct broken autocorrect INitial CApitals

It will be available in 5.3.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Commit Notification 2017-01-19 15:41:29 UTC
Yossi Zahn committed a patch related to this issue.
It has been pushed to "libreoffice-5-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=42849da8f266972ef370104c332d74ff7b15fc71&h=libreoffice-5-2

tdf#96354 correct broken autocorrect INitial CApitals

It will be available in 5.2.6.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.