Bug 38513 - Performance is poor when dealing with files with lots of style switches
Summary: Performance is poor when dealing with files with lots of style switches
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.4.0 release
Hardware: All All
: medium normal
Assignee: Michael Meeks
URL:
Whiteboard: target:4.4.0 target:4.3.0.2
Keywords: perf
Depends on:
Blocks:
 
Reported: 2011-06-20 20:19 UTC by Andriy Rysin
Modified: 2015-12-15 11:40 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
Add original test document in case it gets deleted upstream (2.00 MB, application/vnd.oasis.opendocument.text)
2012-05-10 06:22 UTC, Sebastian@SSpaeth.de
Details
prototype speedup patch (6.59 KB, patch)
2014-06-07 10:26 UTC, Michael Meeks
Details
a simpler attempt - that doesn't work (3.09 KB, patch)
2014-06-07 11:56 UTC, Michael Meeks
Details
a prettier layer on top (26.78 KB, patch)
2014-06-09 20:55 UTC, Michael Meeks
Details
backtrace of the crash (10.82 KB, text/plain)
2014-06-28 10:34 UTC, Jean-Baptiste Faure
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andriy Rysin 2011-06-20 20:19:34 UTC
This is somewhat related to https://bugs.freedesktop.org/show_bug.cgi?id=36013

There was a problem that LO crashed when working with documents which had > 65535 style switches. The crash was fixed but the performance greatly degraded:
attached is the 2MB file which was taking ~2-3 min to open on LO 3.3 and it takes 9 min to open on LO 3.4.
Saving this document takes way more than an hour (!) (LO 3.3 was saving it in less than 10min).
Just to close this document in LO 3.4 takes ~8 min (it's a Core 2 Duo 2.1GHz and I was using 64bit rpms: libreoffice3.4-base-3.4.0-12.x86_64 ...).

On comparison Word 2007 on similar hardware takes about 50sec to open and less than 2 min to save.

I must admit this document is not an average one the users work with but:
a) it's a good test which should nicely show the slow parts of the code and which when optimized would bring speed benefits to smaller documents as well
b) it's a regression so the problem should (hopefully) be easier to understand
c) this document can be opened/saved withing reasonable time on other office software (on very similar hardware) so there should be more efficient algorithms available
Comment 1 Rainer Bielefeld Retired 2011-06-27 09:30:09 UTC
@ Andriy Rysin:
Sample document is missing. 
If your test kit axceeds Bugzilla’s file size limitation of 1 MB you can use our experimental temporary WIKi upload page 
<http://wiki.documentfoundation.org/QA/Bugzilla-Attachments>

Please add additional information due to 
<http://wiki.documentfoundation.org/BugReport> 
(Operating System for tests only Linux?)!
Comment 2 Andriy Rysin 2011-06-27 12:11:01 UTC
I could not connect to wiki server so here's the test document:
http://r2u.org.ua/data/tmp/big_file_test.odt
Comment 3 Rainer Bielefeld Retired 2011-07-06 23:33:18 UTC
[Reproducible] with "LibreOffice 3.4.1RC1 - WIN7  Home Premium (64bit) German UI [OOO340m1 (Build:103)]" 
Opening the document takes several minutes (OOo 3.1.1 does it within 1/2 minute).
Closing the document takes endless time, I terminated after 20 Minutes with app. 2% progress bar and "no response" message (OOo 3.1.1 crashed)

A 3000 pages document is not normal use, but performance should be improved, and the crashes might be a serious problem.

@Cédric
Please feel free to reassign if it’s not your area or if provided information is not sufficient.
Comment 4 Sebastian@SSpaeth.de 2012-05-10 06:22:15 UTC
Created attachment 61367 [details]
Add original test document in case it gets deleted upstream
Comment 5 Sebastian@SSpaeth.de 2012-05-10 07:14:11 UTC
Please find a (uncompressed 26mb) callgrind of importing perhaps the first 5% (before I killed it) http://sspaeth.de/uploads/tmp/callgrind.out.6154.bz2

It shows that 80% of the time is spent in SfxItemSet:Put (43k calls) implying 221 million calls to the != operator and 443 million calls to the iterator.

It's interesting to see that 43k put calls lead to this immense amount of comparisons.

I did not dare to run this callgrind to its full completion.
Comment 6 Andriy Rysin 2012-12-28 20:53:01 UTC
I was wandering if I could provide any help in solving this problem. I was working with a odt file recently (5Mb compressed) and LO took >8 hours to save it on Core(TM)2 Duo 2.10GHz.
Comment 7 Michael Meeks 2013-10-19 23:38:11 UTC
Hi Matus; this one might be a good test document for profiling writer ? :-)
Comment 8 Matúš Kukan 2013-10-21 07:11:14 UTC
Probably not really a problem in Writer.
I've run callgrind on this document (more than 5 hours) in my build with --enable-symbols - if anybody is interested in the output, let me know.
But the point is that we spend ~all of the time in SfxItemPool::Put
to be precise in:
for (; itr != pItemArr->end(); ++itr)
http://opengrok.libreoffice.org/xref/core/svl/source/items/itempool.cxx#774
SfxItemPool::Put is called only ~ 1 million times but probably pItemArr is quite big.
Comment 9 bfoman (inactive) 2014-02-20 18:29:42 UTC
This bug lost its owner - back to NEW.
Comment 10 Michael Stahl (allotropia) 2014-03-17 19:44:39 UTC
a quick try at loading and storing the file on Linux x86_64
gives these CPU times:

26:43 /data/inst/LO_3.4.6/opt/libreoffice3.4/program/soffice.bin
17:25 /data/inst/LO_3.5.7/opt/libreoffice3.5/program/soffice.bin
 4:51 /data/inst/LO_4.1.4/opt/libreoffice4.1/program/soffice.bin --splash-pipe=5

i know what caused the improvement from 3.4 to 3.5: 7c7d5c0eec4efb95d18b735fb9df4754ba9d8b1f
replaced SfxItemPool deque with vector, and should fix
the performance regression vs. the 3.3 release
(which is unable to load the bugdoc due to 16bit int overflows).

deque iterators are surprisingly slow, so for such use-case
vector is much better.

why it's even faster in 4.1 is a mystery to me.
Comment 11 Michael Meeks 2014-06-07 10:26:02 UTC
Created attachment 100598 [details]
prototype speedup patch

So - this is an interesting problem; there are several horribly intermittent performance problems here. It seems that the SfxItemPool management is extremely performance fragile. In particular for non-poolable items - if we happen to add and remove them in the right order (which sadly we do per element in response to SAX callbacks [!] ) - actually just for one type 'SwFmtCharFmt' - I assume because it's an SwClient sub-class then we can either be 'lucky' or not ;-)

The basic problem is that if when we add our temporary element there is a hole at the front of the SfxPoolItemArra_Impl vector, then we will add it there, append the new element to the end of the array, and then remove the first element again [ thus forcing a re-scan of the whole array to find the next hole - which is the expensive bit ].

The attached patch uses an unordered_set for the SfxItemPool - that should have a linear cost for add/removes of non-poolable items - and indeed it significantly helps with performance.

With this patch I get a 61 second load instead of a 290 second load; which is a win. Of course, inevitably it breaks a unit test as well so not to master.

Perhaps there is a simpler / less invasive way.
Comment 12 Michael Meeks 2014-06-07 11:56:52 UTC
Created attachment 100605 [details]
a simpler attempt - that doesn't work

This is a simpler attempt; unfortunately it doesn't work at all well - I gave up waiting. We try to solve half of the problem by speeding up inserting tracking spare space at beginning and/or end of the vector - but it doesn't work. Also - this can never accelerate 'remove' sufficiently so can only solve half the problem.
Comment 13 Michael Stahl (allotropia) 2014-06-07 17:47:30 UTC
wonder if it would work to put in not just 2 free-indexes but a whole stack of them?  could also shrink the array when the size of the stack is > half the array size.

but really some sort of map is probably more appropriate for the problem.

well some things like the RTF export will iterate over all items but
probably that should still be fast enough.

for the non-poolable items using the address as key ought to be sufficient;
for the poolable ones it would require virtual hash function or operator<
which would be quite some work to add everywhere.
Comment 14 Michael Meeks 2014-06-07 20:50:22 UTC
> wonder if it would work to put in not just 2 free-indexes but a whole
> stack of them?  could also shrink the array when the size of the stack
> is half the array size.

Sure - my second patch or something like it would swallow the insertion time but do nothing for the removal time (sadly). My first patch axes that too.

> but really some sort of map is probably more appropriate for the problem.

Agreed.

> well some things like the RTF export will iterate over all items but
> probably that should still be fast enough.

That should all be linear cost with a hash.

> for the non-poolable items using the address as key ought to be sufficient;
> for the poolable ones it would require virtual hash function or operator<
> which would be quite some work to add everywhere.

Right - it'd be nice to have a single container in the deeper future to reduce compares, but it is really not that much of the time in the poolable ones - often there is a tiny to small set of options; not several thousand like in the non-poolable case.

Anyhow - I have a better, cleaner, updated patch I'm testing here based on patch #1, and an even better one re-factored into a prettier form [ SfxItemPool badly needs translating and the docs updating - hopefully this will help clarify it a little ] that's on the way.
Comment 15 Michael Meeks 2014-06-09 20:55:06 UTC
Created attachment 100770 [details]
a prettier layer on top

Working onwards and up-wards, a prettier, virtualized pair of containers for either non-poolable or poolable types to suit the perf. characteristics of each.

Lovely, etc. apart from the fact that it means updating the horrible binary file format stuff that is (nominally) used for cut/paste in some place or other cf. bug#79851 for dunging that out.
Comment 16 Commit Notification 2014-06-17 14:29:18 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1ca7ac128dcdf03e3749af7458beac8d2da8a708

fdo#38513 - Accelerate non-poolable item add / remove.



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 17 Michael Meeks 2014-06-17 14:31:07 UTC
Unfortunately until bug#79815 is fixed we have to waste a bit of space and time here to accomodate it; nevertheless I've pushed a patch to master - and pending seeing what the tinderboxen do before back-porting to 4.3 - that gives a 5x speedup; for me the document loads in 34s instead of 176s after/before.

Michael - any chance of some review for 4.3 ? =)
Comment 18 Commit Notification 2014-06-27 16:13:40 UTC
Michael Meeks committed a patch related to this issue.
It has been pushed to "libreoffice-4-3":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=8aff83b95fa5969edfc48022ddaae05031b178cf&h=libreoffice-4-3

fdo#38513 - Accelerate non-poolable item add / remove.


It will be available in LibreOffice 4.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 19 Jean-Baptiste Faure 2014-06-28 10:34:56 UTC
Created attachment 101912 [details]
backtrace of the crash

When trying to open the testfile with LO 4.3 with the commit, I get a crash by segmentation fault.
Version: 4.3.0.1.0+ Build ID: 8aff83b95fa5969edfc48022ddaae05031b178cf
Clean build on Ubuntu 14.04 x86-64 (make disclean > ./autogen.sh > make)

Attached a bt but my build was done without symbols. Hope this helps.

Best regards. JBF
Comment 20 Andriy Rysin 2014-08-22 00:29:28 UTC
I just tried LibreOffice 4.3 on many of my big documents with large number of style switches and it's flying both for saving and for closing the document (opening seesm to be faster as well).

I would like to thank you guys for fixing this bug. Much appreciated!!!
Comment 21 Robinson Tryon (qubit) 2015-12-15 11:40:12 UTC
Migrating Whiteboard tags to Keywords: (perf)
[NinjaEdit]