Word-count has become a dominant factor of large document load time. We are doing some -really- odd things with i18npool's break-iterators and this results in some epic thrash of the lower levels of the code. All of this called synchronously from SwDoc::UpdateStat - by the status-bar widget - which is a tad irritating. This has to happen before anything is rendered. I wonder - assuming we're caching the results of that work - could we not do that at idle in chunks of a few thousand paragraphs and incrementally ? Either way I attach a couple of prototype patches to speed things up.
Created attachment 71881 [details] tries to stop the language thrash causing repeated re-loading in i18npool Not sure this patch is the ideal solution - but for archiving ... it's perhaps better to have a ICU break-iterator type locale instance; will work on that in a bit.
deadly annoying impacts lots of documents; eg. file 3 of bug#44736
Created attachment 71882 [details] first go at cleaning up the wild & wooly break-iterator code These two together take us down from 500k new ICU breakiterator instantitions to 600 - a 1000 fold improvement in this piece. Unfortunately it seems to mangle one of our unit tests - which (reading it) is somewhat opaque to me - it's under-clear why that should be ;-)
Created attachment 71901 [details] fixed up prototype patch The bug in the previous version becomes immediately apparent when you stop thinking that the previous code worked properly ;-> fixed that nicely; will re-build & re-profile in a sec. Quite possibly we don't need the first patch with the 2nd.
removing the 1st patch I still get ~all the win, so committing just this simpler version. Goes from 71bn cycles to load RTF + word-count => render, to 41 bn cycles - which seems like a reasonable saving.
Michael Meeks committed a patch related to this issue. It has been pushed to "libreoffice-4-0": http://cgit.freedesktop.org/libreoffice/core/commit/?id=dd0af402771c3e7fada4fd8dc69fa12066c6766e&g=libreoffice-4-0 fdo#58590 - cleanup and accelerate break-iterators. It will be available in LibreOffice 4.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Michael Meeks committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=9c6006b961f690728f4035c10f8b9fe9fdb6f332 fdo#58590 - cleanup and accelerate break-iterators. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Michael Meeks committed a patch related to this issue. It has been pushed to "libreoffice-3-6": http://cgit.freedesktop.org/libreoffice/core/commit/?id=ee8f3d557b7ccb88cadd55fe91464a005b321362&g=libreoffice-3-6 fdo#58590 - cleanup and accelerate break-iterators. It will be available in LibreOffice 3.6.5. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.