Bug Hunting Session
Bug 58590 - word-count horrible slowness ...
Summary: word-count horrible slowness ...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
4.0.0.0.beta1
Hardware: All All
: medium normal
Assignee: Michael Meeks
URL:
Whiteboard: target:4.0.0.1 target:4.1.0 target:3.6.5
Keywords:
Depends on:
Blocks: mab4.0
  Show dependency treegraph
 
Reported: 2012-12-20 23:06 UTC by Michael Meeks
Modified: 2012-12-21 16:20 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
tries to stop the language thrash causing repeated re-loading in i18npool (8.64 KB, patch)
2012-12-20 23:07 UTC, Michael Meeks
Details
first go at cleaning up the wild & wooly break-iterator code (11.13 KB, patch)
2012-12-20 23:11 UTC, Michael Meeks
Details
fixed up prototype patch (10.73 KB, patch)
2012-12-21 07:02 UTC, Michael Meeks
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Meeks 2012-12-20 23:06:02 UTC
Word-count has become a dominant factor of large document load time. We are doing some -really- odd things with i18npool's break-iterators and this results in some epic thrash of the lower levels of the code.

All of this called synchronously from SwDoc::UpdateStat - by the status-bar widget - which is a tad irritating. This has to happen before anything is rendered. I wonder - assuming we're caching the results of that work - could we not do that at idle in chunks of a few thousand paragraphs and incrementally ?

Either way I attach a couple of prototype patches to speed things up.
Comment 1 Michael Meeks 2012-12-20 23:07:39 UTC
Created attachment 71881 [details]
tries to stop the language thrash causing repeated re-loading in i18npool

Not sure this patch is the ideal solution - but for archiving ... it's perhaps better to have a ICU break-iterator type locale instance; will work on that in a bit.
Comment 2 Michael Meeks 2012-12-20 23:08:55 UTC
deadly annoying impacts lots of documents; eg. file 3 of bug#44736
Comment 3 Michael Meeks 2012-12-20 23:11:50 UTC
Created attachment 71882 [details]
first go at cleaning up the wild & wooly break-iterator code

These two together take us down from 500k new ICU breakiterator instantitions to 600 - a 1000 fold improvement in this piece.

Unfortunately it seems to mangle one of our unit tests - which (reading it) is somewhat opaque to me - it's under-clear why that should be ;-)
Comment 4 Michael Meeks 2012-12-21 07:02:10 UTC
Created attachment 71901 [details]
fixed up prototype patch

The bug in the previous version becomes immediately apparent when you stop thinking that the previous code worked properly ;-> fixed that nicely; will re-build & re-profile in a sec.

Quite possibly we don't need the first patch with the 2nd.
Comment 5 Michael Meeks 2012-12-21 11:38:59 UTC
removing the 1st patch I still get ~all the win, so committing just this simpler version. Goes from 71bn cycles to load RTF + word-count => render, to 41 bn cycles - which seems like a reasonable saving.
Comment 6 Not Assigned 2012-12-21 11:42:42 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "libreoffice-4-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=dd0af402771c3e7fada4fd8dc69fa12066c6766e&g=libreoffice-4-0

fdo#58590 - cleanup and accelerate break-iterators.


It will be available in LibreOffice 4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 7 Not Assigned 2012-12-21 11:43:00 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=9c6006b961f690728f4035c10f8b9fe9fdb6f332

fdo#58590 - cleanup and accelerate break-iterators.



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 8 Not Assigned 2012-12-21 16:20:55 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "libreoffice-3-6":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=ee8f3d557b7ccb88cadd55fe91464a005b321362&g=libreoffice-3-6

fdo#58590 - cleanup and accelerate break-iterators.


It will be available in LibreOffice 3.6.5.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.