Bug 113042 - FILESAVE: Saving an large calc sheet is slower in LibO5.4 compared to LibO5.3
Summary: FILESAVE: Saving an large calc sheet is slower in LibO5.4 compared to LibO5.3
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.4.0.3 release
Hardware: All All
: medium normal
Assignee: Luboš Luňák
URL:
Whiteboard:
Keywords: bibisectRequest, haveBacktrace, perf, regression
Depends on:
Blocks:
 
Reported: 2017-10-10 19:13 UTC by Telesto
Modified: 2019-06-13 09:08 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments
Bibisect log (2.90 KB, text/plain)
2017-10-10 19:15 UTC, Telesto
Details
screenshot of kcachegrind for callgrind.out.3022 (421.77 KB, image/png)
2017-10-15 08:13 UTC, Julien Nabet
Details
Callgrind output from master (6.63 MB, application/x-xz)
2018-05-31 15:05 UTC, Buovjaga
Details
Large spreadsheet (14.39 MB, application/vnd.oasis.opendocument.spreadsheet)
2019-04-18 17:40 UTC, Buovjaga
Details
Perf flamegraph (348.96 KB, image/svg+xml)
2019-04-18 19:27 UTC, Buovjaga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Telesto 2017-10-10 19:13:45 UTC
Description:
Saving an large calc sheet is slower in LibO5.4 compared to LibO5.3

Steps to Reproduce:
1. Download https://yadi.sk/i/rM9QctDym5y3M
2. Save the file as. Take notice of the time required (around 14 sec with LibO6 and 7/8 sec with LibO5.3

Actual Results:  
Saving a large file is slower

Expected Results:
Should be as fast as in LibO5.3


Reproducible: Always

User Profile Reset: No

Additional Info:
Version: 6.0.0.0.alpha0+
Build ID: c5a93cad149618bbd43632f1660a558c34bdbf7e
CPU threads: 4; OS: Windows 6.3; UI render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2017-10-07_01:04:25
Locale: nl-NL (nl_NL); Calc: CL


User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
Comment 1 Telesto 2017-10-10 19:15:01 UTC Comment hidden (obsolete)
Comment 2 Telesto 2017-10-10 19:16:22 UTC
author	Mohammed Abdul Azeem <azeemmysore@gmail.com>	2016-09-05 09:08:30 (GMT)
committer	Michael Meeks <michael.meeks@collabora.com>	2017-01-25 11:20:48 (GMT)
commit	8154953add163554c00935486a1cf5677cef2609 (patch)
tree	d8e148e84aa1e164a2358827085f4d9240ce5e31
parent	657eea01046c7f39ee8ca4545241372177385946 (diff)
ScXMLTableRowCellContext implements fast interfaces:
Implementation of fast interfaces for contexts in path from
ScXMLImport::CreateFastContext to ScXMLTableRowCellContext.
FastParser is enabled and duplicates are avoided at all
possible places.
OOoXML filters still need those legacy paths we removed,
so I had to temporarily map them to fast elements, which
would increase their load time, but hopefully it should
help us in the long run.
Comment 3 Julien Nabet 2017-10-11 07:32:17 UTC
Mohammed/Michael: following last Telesto's comment, thought you might be interested in this one.
Comment 4 Michael Meeks 2017-10-11 15:10:48 UTC
Hi Julien, thanks for the pointer. The referenced patch is for import only, not for export ... so I guess we'd want to get a bibisect / bisect to a certain patch. Failing that - a callgrind profile of it would really help us isolate the issue I think. Thanks.
Comment 5 Julien Nabet 2017-10-11 15:15:21 UTC
(In reply to Michael Meeks from comment #4)
> ...The referenced patch is for import only,
> not for export ...
Argh, should have noticed the filename, thank you for your feedback and sorry for the noise.
Comment 6 Telesto 2017-10-12 11:43:47 UTC
The bisect results should be right. I checked it again (but I didn't reverse the commit).
Comment 7 Julien Nabet 2017-10-12 12:16:40 UTC
(In reply to Telesto from comment #6)
> The bisect results should be right. I checked it again (but I didn't reverse
> the commit).

Saving concerns export part whereas the quoted patch concerns import part.
(unless the result of import triggers something buggy for export?)

Callgrind trace (quoted by Michael) can be retrieved on Linux by following this process https://wiki.documentfoundation.org/QA/BugReport/Debug_Information#GNU.2FLinux:_How_to_get_a_callgrind_trace but it may take a lot of time to get it since you test on a large file.
Comment 8 Julien Nabet 2017-10-15 08:06:51 UTC
I reduced the file to some lines for retrieving more quickly a callgrind trace.
When following the steps, I got 7 files.
-rw------- 1 julien julien 154041635 oct.  14 21:40 callgrind.out.3022
-rw------- 1 julien julien   8821918 oct.  14 20:27 callgrind.out.3024
-rw------- 1 julien julien  89670788 oct.  14 20:42 callgrind.out.3103
-rw------- 1 julien julien  89834408 oct.  14 20:42 callgrind.out.3105
-rw------- 1 julien julien  89946667 oct.  14 20:42 callgrind.out.3107
-rw------- 1 julien julien  89981772 oct.  14 20:43 callgrind.out.3109
-rw------- 1 julien julien  90270631 oct.  14 20:43 callgrind.out.3112

after tar+bzip2, I got:
76632709 oct.  15 09:49  callgrind.tar.bz2
so it's still too big to be attached.

Any thoughts?

Perhaps it's because I use enable-dbgutil when building (my goal is to have max traces)
Comment 9 Julien Nabet 2017-10-15 08:13:30 UTC
Created attachment 136980 [details]
screenshot of kcachegrind for callgrind.out.3022
Comment 10 Julien Nabet 2017-10-15 08:18:30 UTC
Caolán: thought you might be interested in this one since it seems LO spends a lot of time in a specific part of vcl (but perhaps it's expected)
Comment 11 Michael Meeks 2017-10-16 09:35:34 UTC
Hi Julien - if you create an LDAP account you should have access to our ownCloud / Nextcloud instance - used for eg. ESC meetings, and can up-load / create a public link there.

It looks to me strange to have this image scaling thing on the hot path - we do call that for rendering and scaling the thumbnail we have a the spreadsheet on save - but ... 10bn pcycles is a -lot- of scaling. Can you find who is calling that ?

Possibly some changes in how we handle images / areas has caused a problem here (?)
Comment 12 Julien Nabet 2017-10-16 09:56:02 UTC
(In reply to Michael Meeks from comment #11)
> Hi Julien - if you create an LDAP account you should have access to our
> ownCloud / Nextcloud instance - used for eg. ESC meetings, and can up-load /
> create a public link there.
>...
Do you have a link for the procedure to create an LDAP account? (I found nothing or missed it on the wiki)
Comment 13 Telesto 2017-11-23 17:25:40 UTC
(In reply to Michael Meeks from comment #4)
> Hi Julien, thanks for the pointer. The referenced patch is for import only,
> not for export ... so I guess we'd want to get a bibisect / bisect to a
> certain patch. Failing that - a callgrind profile of it would really help us
> isolate the issue I think. Thanks.
Comment 14 Buovjaga 2018-05-31 15:05:03 UTC
Created attachment 142448 [details]
Callgrind output from master

Arch Linux 64-bit
Version: 6.2.0.0.alpha0+
Build ID: 78bed16e17f5128d3523413255d95de85c0ebf36
CPU threads: 8; OS: Linux 4.16; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group threaded
Built on May 31st 2018
Comment 15 Julien Nabet 2018-06-02 18:54:10 UTC
I can't uncompress xz file
tar -xJf  callgrind-calc.tar.xz
tar: Ignoring unknown extended header keyword 'SCHILY.fflags'

xz -d callgrind-calc.tar.xz gives callgrind-calc.tar
but tar xvf callgrind-calc.tar
gives
tar: Ignoring unknown extended header keyword 'SCHILY.fflags'

Do you use bsdtar?
Comment 16 Buovjaga 2018-06-02 22:15:25 UTC
(In reply to Julien Nabet from comment #15)
> I can't uncompress xz file
> tar -xJf  callgrind-calc.tar.xz
> tar: Ignoring unknown extended header keyword 'SCHILY.fflags'
> 
> xz -d callgrind-calc.tar.xz gives callgrind-calc.tar
> but tar xvf callgrind-calc.tar
> gives
> tar: Ignoring unknown extended header keyword 'SCHILY.fflags'
> 
> Do you use bsdtar?

You can ignore the warning, check your folder: it has the file callgrind.out.31264 after you uncompress it with tar -xJf

I use KDE's Ark, but I don't have BSD tar on my system. Only GNU tar. I don't know why Ark makes it like so.
Comment 17 Julien Nabet 2018-06-04 16:51:19 UTC
(In reply to Buovjaga from comment #16)
> ...............
> You can ignore the warning, check your folder: it has the file
> callgrind.out.31264 after you uncompress it with tar -xJf
> ...
Oups, you're right, I had missed the file.
Comment 18 Julien Nabet 2018-06-04 17:18:56 UTC
(In reply to Julien Nabet from comment #17)
> (In reply to Buovjaga from comment #16)
> > ...............
> > You can ignore the warning, check your folder: it has the file
> > callgrind.out.31264 after you uncompress it with tar -xJf
> > ...
> Oups, you're right, I had missed the file.

I gave a try to kcachegrind but except telling "cycle21" (I don't know what's this) is the most called part, I don't know how to find bottlenecks.
Every doc I find on the net talks about simple cases with a main and a few functions.
Comment 19 Michael Meeks 2018-06-04 17:43:04 UTC
Hi Julien - thanks for having a go here ! =)

> I gave a try to kcachegrind but except telling "cycle21" 

Tip #1 for kcachegrind - turn off cycle detection ;-) the tool lies vigorously with that turned on, and the results are often ~meaningless for non-trivial cases. I'd also turn off the '%' button and use absolute cycle counts everywhere - or it is very easy to loose a sense of proportion =) I'd also check all counts vs. the total in the bottom status bar for sanity.

HTH !
Comment 20 Julien Nabet 2018-06-04 19:08:44 UTC
(In reply to Michael Meeks from comment #19)
> Hi Julien - thanks for having a go here ! =)
> 
> > I gave a try to kcachegrind but except telling "cycle21" 
> 
> Tip #1 for kcachegrind - turn off cycle detection ;-) the tool lies
> vigorously with that turned on, and the results are often ~meaningless for
> non-trivial cases. I'd also turn off the '%' button and use absolute cycle
> counts everywhere - or it is very easy to loose a sense of proportion =) I'd
> also check all counts vs. the total in the bottom status bar for sanity.
> 
> HTH !

Thank you Michael for your feedback.
I got KCacheGrind 0.8.0 (from Debian testing).
I unchecked "Relative" (must correspond to %) + "Cycle detection".

I don't see "absolute cycle counts" or "total", I only got file name + "Total Instruction Fetch Cost" in the bottom status bar and I can't right click on it.
I sorted by "Called", the most called is rtl_uString_release (162 582 170) but I suppose it could be expected.

Must I check "Relative to parent"? Must I use some Grouping?
Comment 21 Michael Meeks 2018-06-04 19:28:40 UTC
> I unchecked "Relative" (must correspond to %) + "Cycle detection".

Great =)

> I don't see "absolute cycle counts" or "total", I only got file name +
> "Total Instruction Fetch Cost" in the bottom status bar and I can't
> right click on it.

That's fine.

> I sorted by "Called", the most called is rtl_uString_release (162 582 170)
> but I suppose it could be expected.

Right - when you look at any profile there is a ton of methods like that being called a very large number of times.

I personally always use the "callee map" - I select the 'main' function on the left and then see what is going on in the callee map - you can click there to drill down. 

When you're getting near the bottom (don't try near the top of the stack) the 'call graph' thing can be rather useful. It's all a great way to explore who calls whom in the code =)

HTH.
Comment 22 Buovjaga 2019-04-18 17:40:39 UTC
Created attachment 150858 [details]
Large spreadsheet

Let's attach it so it doesn't get lost
Comment 23 Buovjaga 2019-04-18 19:27:39 UTC
Created attachment 150860 [details]
Perf flamegraph

Decided to investigate filesave perf issues after Noel optimised some stuff.

In this graph, I see large plateaus concerning zipping at the end.

Arch Linux 64-bit
Version: 6.3.0.0.alpha0+
Build ID: 1fee3f1da6291bfbcd75455512726215d41b3e83
CPU threads: 8; OS: Linux 5.0; UI render: default; VCL: gtk3; 
Locale: fi-FI (fi_FI.UTF-8); UI-Language: en-US
Calc: threaded
Built on 18 April 2019
Comment 24 Michael Meeks 2019-04-18 20:20:15 UTC
The zip compression is expensive, although parallelized for a single large sheet it doesn't work so well. We have a plan to improve that in the next month or two - perhaps it'll get into 6.3 =) Nice profiling flame-graph though: love it.
Comment 25 Xisco Faulí 2019-05-22 11:15:24 UTC
in

Version: 6.3.0.0.alpha1+
Build ID: e7fb44b8c86b4dce1d4c6f15752cc3bf62b14888
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

it takes

real	0m30,067s
user	0m35,437s
sys	0m1,650s
Comment 26 Luboš Luňák 2019-05-28 10:29:02 UTC
Zip parallel compression has been implemented in gerrit#73032.
Comment 27 Buovjaga 2019-06-01 16:06:34 UTC
(In reply to Luboš Luňák from comment #26)
> Zip parallel compression has been implemented in gerrit#73032.

Saving takes only a second now. I guess some polishing patches are coming still.

Arch Linux 64-bit
Version: 6.4.0.0.alpha0+
Build ID: 7272b9f752cb74757d6ed504202eefccc589f804
CPU threads: 8; OS: Linux 5.1; UI render: default; VCL: gtk3; 
Locale: fi-FI (fi_FI.UTF-8); UI-Language: en-US
Calc: threaded
Built on 1 June 2019
Comment 28 Luboš Luňák 2019-06-03 10:09:16 UTC
(In reply to Buovjaga from comment #27)
> Saving takes only a second now. I guess some polishing patches are coming
> still.

Not from me, I consider the parallel zip stuff complete. Is there still something more to do about this bugreport? It looks to me it can be closed.
Comment 29 Xisco Faulí 2019-06-03 10:24:12 UTC
(In reply to Luboš Luňák from comment #28)
> (In reply to Buovjaga from comment #27)
> > Saving takes only a second now. I guess some polishing patches are coming
> > still.
> 
> Not from me, I consider the parallel zip stuff complete. Is there still
> something more to do about this bugreport? It looks to me it can be closed.

Let's close it then.
it takes 5 seconds in

Version: 6.3.0.0.beta1+
Build ID: 219e128553645911685b6061f7c5ea359a4c551c
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

otoh, if I convert it from command line, it takes

real	0m29,096s
user	0m38,720s
sys	0m2,016s

same time as in comment 25.

@Luboš, is the parallel zip not used in headless mode ?
Comment 30 Luboš Luňák 2019-06-03 11:23:09 UTC
(In reply to Xisco Faulí from comment #29)
> @Luboš, is the parallel zip not used in headless mode ?

Not by design. If that happens, it should be some external factor (thread pool not working in headless mode, maybe?).
Comment 31 Luboš Luňák 2019-06-03 14:04:00 UTC
And I actually cannot confirm that conversion from the CLI would take longer. It takes about the same as in the UI as if I do "soffice --headless --convert-to ods --outdir out/ 500k_formul.ods".
Comment 32 Luboš Luňák 2019-06-13 09:08:09 UTC
Let's close this, unless somebody can show there is still a problem.