Bug Hunting Session
Bug 55570 - autocorrect slow-down
Summary: autocorrect slow-down
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
4.0.0.0.alpha0+ Master
Hardware: Other All
: medium major
Assignee: Not Assigned
URL:
Whiteboard: target:4.0.0.0.beta0 target:4.1.0
Keywords:
Depends on:
Blocks: mab4.0
  Show dependency treegraph
 
Reported: 2012-10-03 10:30 UTC by Michael Meeks
Modified: 2014-06-07 05:40 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
2 large autocorrect database (65K entries each) (744.83 KB, application/zip)
2012-10-03 19:52 UTC, tommy27
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Meeks 2012-10-03 10:30:10 UTC
mmmhhh... it seems that performance in LOdev 3.7 declined versus current 3.6 stable releases... so I'm REOPENING this bug.

my tests say that the freeze time after typing the first word in a Writer document become longer in LOdev 3.7

--------------------
LibO 3.6.1 
--------------------
acor_.dat (65K entries)
5 seconds

acor_.dat 65K entries + acor_it-IT.dat 55K entries
9 seconds

--------------------
master 3.7.x
--------------------
acor_.dat 65K entries 
13 seconds 

acor_.dat 65K entries + acor_it-IT.dat 55K entries
20 seconds 

acor_.dat 65K entries + acor_it-IT.dat 119 entries *
26 seconds 

* this test can't be done in 3.6.1 since only master currently supports autocorrect databases larger than 65K


I suspect that the longer freeze of LOdev 3.7 is a side effect of the new "32-bit autocorrect databases" introduced by this committ: 

http://cgit.freedesktop.org/libreoffice/core/commit/?id=78a39502de36c32d7aaf6e5bbde1f8df80fdd21f

so I'm adding Tomaz Vajngerl to the CC List, since he's the man who's doing a lot of work fixing autocorrect performance issues, and maybe he's could find a solution in this area he already knows very well

@ Tomaz

if you read Comment 17 and Comment 25 from Micheal Meeks, you'll see that he saw room for improvement for his partial 3.5.4 fix
Comment 1 Michael Meeks 2012-10-03 10:31:19 UTC
I've cloned this bug from bug#46805 - to which some of the pointers refer.
Comment 2 Tomaz Vajngerl 2012-10-03 11:16:46 UTC
> I suspect that the longer freeze of LOdev 3.7 is a side effect of the new
> "32-bit autocorrect databases" introduced by this committ: 

I don't think this has any impact on performance - the change is only sal_uInt16 to sal_uInt32. The difference between 3.6 and 3.7 in autocorrect code is the change from internal data structure to hold the auto-correct words, to std::set. Of all changes this is by far the most radical that could cause performance degradation. My idea for this is to exchange std::set with a lookuptree by modifying the lookuptree (which we use now for auto-complete) to store a additional value (correction value) if there exists a word in the tree. But this modification will take a while to implement.

Regard, Tomaž
Comment 3 tommy27 2012-10-03 11:47:06 UTC
@Michael
sorry for messing the 3.5 MABs and thanks for filing this follow-up bug.


@Tomaz

maybe your "16bit to 32bit" committ is not guilty.
as suggested by Korrawit I'm going to test a LOdev 3.7 build without your patch to see if the performance decline took place earlier.

as far as I see your committ was pushed on 2012-09-19 18:12:03 (GMT) so master~2012-09-18_23.21.04_LibO-Dev_3.7.0.0.alpha0_Win_x86_install_en-US.msi  should be the last daily build without it.

stay tuned for my feedback. I should be ready this evening.
Comment 4 Tomaz Vajngerl 2012-10-03 17:56:21 UTC
(In reply to comment #3)
> maybe your "16bit to 32bit" committ is not guilty.
> as suggested by Korrawit I'm going to test a LOdev 3.7 build without your
> patch to see if the performance decline took place earlier.
> 
> as far as I see your committ was pushed on 2012-09-19 18:12:03 (GMT) so
> master~2012-09-18_23.21.04_LibO-Dev_3.7.0.0.alpha0_Win_x86_install_en-US.msi
> should be the last daily build without it.
> 
> stay tuned for my feedback. I should be ready this evening.

Well.. test it if you like but I checked now and in that commit I did not even touch the actual data structure which holds the auto-correct words. All the changes I made were on the dialog code.

Regards, Tomaž
Comment 5 tommy27 2012-10-03 19:52:40 UTC
Created attachment 68055 [details]
2 large autocorrect database (65K entries each)

-----------------------------------
STEPS TO REPRODUCE
-----------------------------------

1- unzip the attached file (contains acor_.dat & acor_it-IT.dat)

2- enter the autocorr subfolder in your user profile (in Windows is under: User\LibreOffice 3\user\autocorr) and put the acor_.dat & acor_it-IT.dat files there (backup yours first) which have 65000 entries each.
 
3- start LibO and open a blank writer file.

4- set language as "italian" (alternatively you may rename the acor_it-IT.dat file to match you default locale... i.e. acor_en-GB.dat or ... _de-GE.dat)

- digit "test" and hit space...

- LibO freezes for some seconds before finally seeing the cursor move to the space position

as said before that was a known LibO 3.5 MAB which was partially fixed by M.Meeks who brought down the freeze time from 14 to 9 seconds of current 3.6.x versions.

LibOdev 3.7 is instead slower. latest test performed on "Win-x86@6/ master~2012-09-18" shows a freeze time of 13 seconds.

another test on  "W2008R2@16-minimal_build master~2012-09-18" shows a freeze of 20 seconds.

so Tomaz's commits seem innocence since a first performance drop was already present in the last daily before he pushed his patch to master.
Comment 6 Roman Eisele 2012-10-04 14:20:18 UTC
Comment on attachment 68055 [details]
2 large autocorrect database (65K entries each)

(Fixed MIME type)
Comment 7 tommy27 2012-12-08 19:25:53 UTC
(In reply to comment #0)
 
> --------------------
> LibO 3.6.1 
> --------------------
> acor_.dat (65K entries)
> 5 seconds
> 
> acor_.dat 65K entries + acor_it-IT.dat 55K entries
> 9 seconds
> 
> --------------------
> master 3.7.x
> --------------------
> acor_.dat 65K entries 
> 13 seconds 
> 
> acor_.dat 65K entries + acor_it-IT.dat 55K entries
> 20 seconds 
> 
> acor_.dat 65K entries + acor_it-IT.dat 119 entries *
> 26 seconds 
> 
> * this test can't be done in 3.6.1 since only master currently supports
> autocorrect databases larger than 65K
> 

I redid the test with current LibO 4.0 daily build
Version 4.0.0.0.beta1+ (Build ID: 7061f72159e38e76134bc7fefc8a75cd233889c)

the current test was performed on the same machine of previous tests (Windows Vista 64 Home Premium SP1. Intel Core 2 Duo CPU P8400 @ 2.26GHz - 4GB RAM).

here's the new results:

--------------------
master 4.0.x
--------------------

acor_.dat 65K entries 
6 seconds

acor_.dat 65K entries + acor_it-IT.dat 55K entries
13 seconds

master 4.0 + acor_.dat 65K entries + acor_it-IT.dat 119 entries 
18 seconds 


so it's seems that the performance of 4.0 beta has improved since 3.7 alpha, but it's still worse than 3.6.x 


it would be interesting to know the reason for the improvement from 3.7 to 4.0... has anybody touched the autocorrect engine or was is just a lucky side effect of other committ?
Comment 8 Michael Meeks 2012-12-10 16:27:24 UTC
working on this in my idle cycles. It -seems- that a big chunk of the slowness is down to the sorted std::set in there - which looks un-necessary (to me at least).

Am re-factoring the SvxAutocorrWordList and hiding it's internal impl. details first.
Comment 9 Michael Meeks 2012-12-10 16:53:35 UTC
Amazingly it seems that a big chunk of our time is spent by the (banal) copy constructor of the std::set - we seemingly duplicate the initial plain SvxAutocorrect instance into an SwAutoCorrect sub-class (typical) which derives from this later in the day. Sadly unwinding exactly what is going on there and how it interacts with load/save is beyond the scope of the time I have now:

sw/source/ui/app/swmodule.cxx:

    // replace SvxAutocorrect with SwAutocorrect
    SvxAutoCorrCfg& rACfg = SvxAutoCorrCfg::Get();
    const SvxAutoCorrect* pOld = rACfg.GetAutoCorrect();
    rACfg.SetAutoCorrect(new SwAutoCorrect( *pOld ));

appears to end up doubling the amount of work/space wasteage etc. here (somehow) though perhaps I mis-understand the callgrind trace; something is very odd.
Comment 10 Michael Meeks 2012-12-10 20:43:53 UTC
Fixed in master; will pick to -4-0 shortly - thanks for the report :-)

~all the remaining time in the profile is XML parsing - for future work we should prolly switch to use the fast-parser instead, to accelerate that, and look for more simple cases that we can optimise there.
Comment 11 Not Assigned 2012-12-10 20:48:25 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "libreoffice-4-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=b7ee9a329ca105c3ce2a22e661373879f0d24b92&g=libreoffice-4-0

fdo#55570 - re-factor SvxAutocorrWordList to hide it's innards


It will be available in LibreOffice 4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 12 Not Assigned 2012-12-10 20:48:43 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "libreoffice-4-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=c1fa87738327413939f0355efd6cb73bc4eac889&g=libreoffice-4-0

fdo#55570 - use a hash instead of set initially until sorting needed


It will be available in LibreOffice 4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 13 Not Assigned 2012-12-10 20:49:03 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=cccb0bb123ac89a0a4cb8ba335ce5cb64cdc87cf

fdo#55570 - re-factor SvxAutocorrWordList to hide it's innards



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 14 Not Assigned 2012-12-10 20:49:22 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=e7e4d6778661d039d6ee9cba1da1a22cea542f3e

fdo#55570 - use a hash instead of set initially until sorting needed



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 15 tommy27 2012-12-12 13:01:30 UTC
(In reply to comment #7)

retested with latest daily build with Micheal Meeks fixes.
Version 4.0.0.0.beta1+ (Build ID: 63f68ccd72d9d6dc847f1b6ebf19ee2f8019f54)

>
> I redid the test with current LibO 4.0 daily build
> Version 4.0.0.0.beta1+ (Build ID: 7061f72159e38e76134bc7fefc8a75cd233889c)
> 
> ...
> 
> --------------------
> master 4.0.x
> --------------------
> 
> acor_.dat 65K entries 
> 6 seconds

now it takes just 3 seconds.

> acor_.dat 65K entries + acor_it-IT.dat 55K entries
> 13 seconds

now it takes just 6 seconds

> master 4.0 + acor_.dat 65K entries + acor_it-IT.dat 119 entries 
> 18 seconds 

it still takes 6 seconds.

> 
> so it's seems that the performance of 4.0 beta has improved since 3.7 alpha,
> but it's still worse than 3.6.x 


so, the performance is dramatically increased and now if even faster than 3.6.x

consider that the test has been performed on a 4 years old laptop so newer machines will probably have better performance

as my tests show, the freeze time doesn't get worse with larger autocorrect databases... is still takes the same time (6 seconds) if you deal with two 65K lists or with one 65K list and one 119K list

the same happens if you have only one list...
it still takes 3 seconds either if you use one 65K list or a 119K list.

so, let me say WELL DONE!!! NICE FIX MICHEAL!!!
Comment 16 Not Assigned 2012-12-12 13:30:14 UTC
Caolan McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=6f6056b62e2107b325624832d851b09711a11dd8

Related: fdo#55570 presumably we can attempt the insert...



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 17 tommy27 2012-12-12 14:07:23 UTC
@Caolan McNamara

could you please try to explain me in simple words (I'm not a developer and did not understand the committ infos) what your latest patch is going to do?
is that an additional improvement to Micheak Meeks patches?
Comment 18 Michael Meeks 2012-12-12 14:14:36 UTC
a minor code cleanup with no significant improvement in overall speed :-)