Created attachment 100586 [details] large autocorrect lists ................. BACKGROUND ................. Large autocorrect lists are known to cause transient LibO freeze while you type the first word of a document. Performance has been improved during years (see Bug 46805 and Bug 55570) and was stable till 4.2.4.2. now I've noticed a consisted performance regression in 4.4.0.0.alpha0+ Build ID: dacab84536f61f79a5422e1d8d394019ec62f3ec TinderBox: Win-x86@39, Branch:master, Time: 2014-06-04_04:27:18 ................... STEPS TO REPRODUCE ................... - unzip attached file and extract the acor_und.dat and acor_it-IT.dat files (very large lists containing 191034 and 83027 entries) under the autocorr subfolder of the user profile (see: https://wiki.documentfoundation.org/UserProfile#Default_location) - open a Writer new document - set language as Italian (or rename the acor_it-IT.dat to match you default language localization i.e acor_en-GB.dat or acor_de-DE.dat) - type test and hit space ................... CURRENT BEHAVIOUR ................... LibO freezes for a few seconds before you can actually see the cursor move 1 space forward and you are able to digit again. LibO 4.2.4.2 --> 4.3 seconds LibO 4.4.0.0alpha0+ --> 10.3 seconds tests were done on a 5 years old laptop Win7x64 sp1 - Intel Core 2 Duo - 2.27 GHz - 4GB RAM - HDD so time will probably be shorter with more powerful equipment, nevertheless this is a sensible performance regression in respect to 4.2.x
To make any progress on this bug, we need a callgrind trace of this happening. Please get a build with symbols and then run: unset MALLOC_CHECK_ unset MALLOC_PERTURB_ unset G_SLICE export SAL_DISABLE_FLOATGRAB=1 export OOO_DISABLE_RECOVERY=1 valgrind --tool=callgrind --simulate-cache=yes --dump-instr=yes ./soffice.bin --splash-pipe=0 >& /tmp/callgrind-log.txt gzip callgrind-log.txt do your typing, then exit attaching callgrind-log.txt.gz to this bug, or up-load it. Without that, its hard to fix. :-)
for the record I retested against LibO 4.3.0.0.beta2 Build ID: a06aa316117a6ff0f05c697c82831c227812d810 in 4.3beta2 the performance is almost the same as in LibO 4.2.4.2 hence the regression is limited to the 4.4.x branch. regarding the trace I'm on Windows so I have to grab a 4.4.x master build with symbols. I took a look at this link: http://dev-builds.libreoffice.org/daily/master/Win-x86@39/symbols/ but it looks confusing to me what should I download.
Hi Tommy, you can't run callgrind on Windows AFAIK, which is a shame of course, but ... ;-) I strongly suspect that the performance is the same of this piece on both platforms, and that profiling on Linux with great tools will show what's up.
Created attachment 100773 [details] SimpleProgramDebugger results 4.3beta2 vs 4.4alpha I tried to use SimpleProgramDebugger ( http://www.nirsoft.net/utils/simple_program_debugger.html ) and attached to the soffice.bin process while reproducing the issue in 4.3beta2 and 4.4alpha I don't know if those files can be useful to debug or not. can you spot a difference between the 4.3 and 4.4 results?
Created attachment 101097 [details] Very Sleepy profiling results another try... used this Windows profiler Very Sleepy 0.82 ( http://www.codersnotes.com/sleepy ) and saved it's results either in .sleepy extension or .csv format test was done attaching Very Sleepy to soffice.bin and profiling the "4668" Thread, Location "GetMessageW"
Sadly the 'very sleepy' output is impenetrable compared to callgrind -> kcachegrind visualisation that points directly at the problem =) I could make nothing of the results there I'm afraid. Grabbing a VirtualBox and running Linux inside it so you can run callgrind would work well.
sorry that Very Sleepy was useless. I revert status to UNCONFIRMED since it has not been yet reproduced by other users. hopefully a Linux user could try profiling with callgrind
Created attachment 101555 [details] callgrind log file Callgrind log file
Confirmed on: * Version: 4.4.0.0.alpha0+ * Build ID: 113866583c5a34b9242aeb9040c4ce80e65da1ba * OS: Ubuntu 14.04 x86_64 ------------- Set platform and OS to All --> Reported on Windows x86 and confirmed on Ubuntu 14.04 x86_64. Attached a valgrind output with options and variables set like in comment 1.
@Marina Latina thank you very much!!! @Micheal Meeks I hope that output will help you finding the cause of the issue. I copy here a comment you wrote about Bug 55570 >Michael Meeks 2012-12-10 20:43:53 UTC >Fixed in master; will pick to -4-0 shortly - thanks for the report :-) >~all the remaining time in the profile is XML parsing - for future work we >should prolly switch to use the fast-parser instead, to accelerate that, and >look for more simple cases that we can optimise there. is this speed-up idea still valid? moreover I'm curious to know - if it's possible - why these performance sometimes regresses during master developement... the original autocorrect slowdown issue inherited from OOo was Bug 46805 that you fixed, then regression Bug 55570 appeared in 3.7/4.0 master era and you fixed that as well, but now it's back in 4.4 master (4.3.0 beta2 was still free of it like 4.2.x branch)
> moreover I'm curious to know - if it's possible - why these performance > sometimes regresses during master developement... At a low level - it's hard to say why this specific regression happened until we look into it. From a high level, judging the performance impact of any given change on the rest of the 8 million lines of code is ~impossible. Speeding one thing up, can make another thing slower quite easily; adding an unrelated feature can cause a significant performance regression etc. To try to catch this we have automated performance regression tests which generate callgrind data. These don't capture auto-correct performance though. Marina ... thanks for running the profile; I'm afraid my instructions were wrong ... [ sorry ! ]. The callgrind results will be saved in a callgrind.11541.out file - can you gzip that and attach it ? - hopefully its still there. Since that has the real data to fix this with !.
(In reply to comment #11) > > Marina ... thanks for running the profile; I'm afraid my instructions were > wrong ... [ sorry ! ]. > > The callgrind results will be saved in a callgrind.11541.out file - can you > gzip that and attach it ? - hopefully its still there. Since that has the > real data to fix this with !. It's my first attempt with valgrind and the man pages are very big to read in few minutes. I done a rapid copy&paste from your comment so, Michael, no problem! ;) I recap my steps: 1) Used the sources https://bitbucket.org/vmiklos/lo-daily (are they ok also for bibisect?) 2) Started LibreOffice for creating the fresh user profile with: ./opt/program/soffice.bin -env:UserInstallation=file:///tmp/soffice-bisect-$(git rev-parse HEAD) 3) Closed LibreOffice 4) Copied the .dat files in the autocorrect folder 5) Setted the variables: unset MALLOC_CHECK_ unset MALLOC_PERTURB_ unset G_SLICE export SAL_DISABLE_FLOATGRAB=1 export OOO_DISABLE_RECOVERY=1 6) started valgrind with: valgrind --tool=callgrind --simulate-cache=yes --dump-instr=yes ./opt/program/soffice.bin -env:UserInstallation=file:///tmp/soffice-bisect-$(git rev-parse HEAD) --splash-pipe=0 >& /tmp/callgrind-log.txt The out file is too big (18MB) so here you can find the gzip file uploaded on dropbox https://www.dropbox.com/s/v24vifucf1x1ztq/callgrind.out.11541.gz
Unfortunately the trace (thanks for transferring it) is done on a LibreOffice without debugging information [ though I see you're helping out with bisects which is cool ! =], so it's not that useful. Getting an older version of LibreOffice (or any that exhibit the bug) with debuginfo installed is really a pre-requisite here; that's what I mean by a 'build with symbols' Thanks !
is the Linux debug version the one listed as Linux-rpm_deb-x86@45-TDF-dbg/ ? at this iink: http://dev-builds.libreoffice.org/daily/master/Linux-rpm_deb-x86@45-TDF-dbg/
Not certain; but sounds plausible =)
good news. I've retested using : 4.4.0.0.alpha0+ Build ID: 95272e7e5b7e38753ab07dbd6503b7cfa2974842 TinderBox: Win-x86@42, Branch:master, Time: 2014-06-26_23:01:43 under the same PC and test environment the autocorrect slow-down of 4.4 master is back to the same lenght of the 4.2.x branch, which is 4.3 seconds. so the performance regression I saw in early june is now gone in recent builds. I set status as RESOLVED WORKSFORME however any effort (profiling, coding etc. etc.) to improve the current situation (which is annoying in older machines) would be appreciated as well
bad news... slow-down is back in recent build (*) and is even worse than before... now the delay is 35 seconds and makes Writer experience unacceptable... I'm adding this to mab4.4 list (*) 4.4.0.0.alpha0+ Build ID: d5dd1216804afae35d7fe7dbb1d37b0ca1fcce88 TinderBox: Win-x86@39, Branch:master, Time: 2014-07-11_00:09:47
So - again if someone has a build with debugging symbols (but -not- debugutil - ie. the linux tinderbox debug build is not useful) - then they can produce a callgrind trace that immediately points at the problem: which is prolly really silly =)
Matus - I'd love to have some input from you on writing performance tests that are things that are not startup / load / save =) Is there some set of unit tests we could create and/or a makefile rule that we could hook into to create a series of such tests we can run ? The large auto-correct database is a perennial problem to keep at-speed =) Thanks !
I can't reproduce this bug (commit e723ee8622447300a1aaeca24df3f35f3dcc9368). For me it's almost the same as in 4.3, the time and calgrind profile too. If you are interested, you can find it here: https://owncloud.documentfoundation.org/public.php?service=files&t=1c09a1a653cad5da24f420d56f94e49a&download Anyhow, working on a callgrind performance test for this.
thanks Matus. your profiling will probably help improve the situation in 4.3.x which is still suboptimal (that delay is quite annyoing on older machines) regarding the 4.4.x performance I'll later test a newer daily build since I've already experienced transient performance decrease in a build and subsequent return to normality in another (see comment 16)
retested with 4.4.0.0.alpha0+ Build ID: abc28ffc04067eb24840fbf564c311aaee10f84d TinderBox: Win-x86@42, Branch:master, Time: 2014-07-15_07:20:05 performance is again back to 4.2.x level so I mark again this as RESOLVED however any improvement to current delay thanks to profiling results would be very appreciated
@Micheal Meeks have you ever had time to inspect the profiling results that Matus obtained in comment 20. is there anything you can tweak to improve current performance?
Hi Tommy - unfortunately, we need a good callgrind profile (with symbols) of when this issue really takes forages ;-) the 8bn pcycles that matus shows is quite a lot of time of course. A chunk of the time there is raw / pure XML parsing - we have 2.5bn cycles inside the expat library - which we could almost entirely loose by using the XFastParser and doing that in its own thread ;-) David Ostrovsky kindly agreed to take a look at doing something there I think - I forget what the bug# was though =) any news there David ? Of course re-using the XFastParser is always a good idea since that's the code path we optimize for document load too =) Then again - looking at the data here, it seems to me like a big chunk of the problem is: SwXMLTextBlocks::SwXMLTextBlocks Which -seems- to init an SfxObjectShell each time - taking around 2bn cycles. That looks massively expensive to do inside every start_Element for the XML file ;-) Why does SwAutoCorrect::GetLongText do that I wonder ? re-opening, since that really looks quite odd.
Michael Meeks committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=8d6697587776136f3121733e1c29d4200720dbd9 fdo#79761 - avoid re-canonicalising each font name on clone repeatedly. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Hi Micheal, just retested with 4.4.0.0.alpha0+ Build ID: f4246fab77113147b36706a1f3d93e8724ff826b TinderBox: Win-x86@42, Branch:master, Time: 2014-08-23_03:27:09 unfortunately the performance of master seems the same as before... I still see the same 4.3 seconds delay that I see in LibO 4.2.x and 4.3.x according to the build time your patch should be in the daily I tested, so it seems that your tweak sadly had no effect on it. as said in previous comment 22 the performance of masters is the same of previous branches and the worsening I saw in comment 0 and comment 17 were only transient episodes that I did not see in following dailies. so I'm wondering if it's better to edit summary notes with something like: "autocorrect residual slow-down" rather than the current "autocorrect slow-down got worse"
(In reply to comment #26) > ... > > as said in previous comment 22 the performance of masters is the same of > previous branches and the worsening I saw in comment 0 and comment 17 were > only transient episodes that I did not see in following dailies. > > so I'm wondering if it's better to edit summary notes with something like: > "autocorrect residual slow-down" rather than the current "autocorrect > slow-down got worse" did that, and removed from mab4.4 list since this is not specific to master branch.
Daniel did a great job of switching to the XFastParser - so now the XML parsing should come ~free here, cf. bug#80403 which is nearly closed for 4.4 Of course, there is still this odd performance problem I highlight in comment #24 around SwXMLTextBlocks that I suspect would be good to dig into to try to make this more efficient. Any cycles for that Daniel ?
(In reply to Michael Meeks from comment #28) > Of course, there is still this odd performance problem I highlight in > comment #24 around SwXMLTextBlocks that I suspect would be good to dig into > to try to make this more efficient. Any cycles for that Daniel ? I'm working on that. Each time we init SwXMLTextBlocks, we parse BlockList.xml. Which is about 170 times with the attached autocorrect lists, although it isn't changed.
Daniel Sikeler committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=4404b718bdb547cb9b7b17c73a53574724cdeeb7 fdo#79761: parse BlockList.xml only once It will be available in 4.4.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
retested under Win7x64 I compared 4.3.2.2 and 4.4.0.0.alpha1+ Build ID: d59b9b4af36148e4d8df8f3e3492116d378642e2 TinderBox: Win-x86@51-TDF, Branch:MASTER, Time: 2014-11-05_22:58:24 starting autocorrect options LO 4.3.2.2 --> 5 seconds LO 4.0.0.0 --> 3 seconds typing first word and press space LO 4.3.2.2 --> 4.3 seconds LO 4.0.0.0 --> 1.5 seconds I would call this RESOLVED FIXED but I let the honour to do it to Daniel who did this amazing work and considerably improved LibO autocorrect performance. remember that these times are obtained on an old laptop... if you work with new machines the delay will be even shorter... anyway I think it was good to fix and improve performance for people on older machines. thumbs up Daniel!!! maybe there's still some room for improvement? current result are however already. very satisfying
(In reply to tommy27 from comment #31) > .... > > starting autocorrect options > LO 4.3.2.2 --> 5 seconds > LO 4.0.0.0 --> 3 seconds > > for the record, this was an old OOo issue at: https://issues.apache.org/ooo/show_bug.cgi?id=101726 at that time OOo and LibO 3.5.0 took *15 minutes* to load a 65K entries autocorrect list... time was brought down to just 7 seconds by LibO volunteer developer Deszi Szabolcs in LibO 3.5.1 now after many fixes and code improvement trough the years, including the latest fix by Daniel, LibO can now load even larger lists in less time... the attachment 100586 [details] contains 2 large list, respecitively: - acor_it-IT --> 83027 entries --> loaded in 3.0 seconds - acor_und --> 191034 entries --> loaded in 5.4 seconds thank to all the developers who made these achievement possible. I'm proud of being part of this free software community.
This bug fix comes with LibreOffice 4.4 (release notes https://wiki.documentfoundation.org/ReleaseNotes/4.4) Loading large autocorrect lists is at least twice as fast now, because the lists are parsed in a thread by the fastparser and repeated parsing of unchanged files is avoided.