Description: Saving an large calc sheet is slower in LibO5.4 compared to LibO5.3 Steps to Reproduce: 1. Download https://yadi.sk/i/rM9QctDym5y3M 2. Save the file as. Take notice of the time required (around 14 sec with LibO6 and 7/8 sec with LibO5.3 Actual Results: Saving a large file is slower Expected Results: Should be as fast as in LibO5.3 Reproducible: Always User Profile Reset: No Additional Info: Version: 6.0.0.0.alpha0+ Build ID: c5a93cad149618bbd43632f1660a558c34bdbf7e CPU threads: 4; OS: Windows 6.3; UI render: default; TinderBox: Win-x86@42, Branch:master, Time: 2017-10-07_01:04:25 Locale: nl-NL (nl_NL); Calc: CL User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
Created attachment 136899 [details] Bibisect log Bisected to: author Mohammed Abdul Azeem <azeemmysore@gmail.com> 2016-09-05 09:08:30 (GMT) committer Michael Meeks <michael.meeks@collabora.com> 2017-01-25 11:20:48 (GMT) commit 8154953add163554c00935486a1cf5677cef2609 (patch) tree d8e148e84aa1e164a2358827085f4d9240ce5e31 parent 657eea01046c7f39ee8ca4545241372177385946 (diff) ScXMLTableRowCellContext implements fast interfaces: Implementation of fast interfaces for contexts in path from ScXMLImport::CreateFastContext to ScXMLTableRowCellContext. FastParser is enabled and duplicates are avoided at all possible places. OOoXML filters still need those legacy paths we removed, so I had to temporarily map them to fast elements, which would increase their load ti
author Mohammed Abdul Azeem <azeemmysore@gmail.com> 2016-09-05 09:08:30 (GMT) committer Michael Meeks <michael.meeks@collabora.com> 2017-01-25 11:20:48 (GMT) commit 8154953add163554c00935486a1cf5677cef2609 (patch) tree d8e148e84aa1e164a2358827085f4d9240ce5e31 parent 657eea01046c7f39ee8ca4545241372177385946 (diff) ScXMLTableRowCellContext implements fast interfaces: Implementation of fast interfaces for contexts in path from ScXMLImport::CreateFastContext to ScXMLTableRowCellContext. FastParser is enabled and duplicates are avoided at all possible places. OOoXML filters still need those legacy paths we removed, so I had to temporarily map them to fast elements, which would increase their load time, but hopefully it should help us in the long run.
Mohammed/Michael: following last Telesto's comment, thought you might be interested in this one.
Hi Julien, thanks for the pointer. The referenced patch is for import only, not for export ... so I guess we'd want to get a bibisect / bisect to a certain patch. Failing that - a callgrind profile of it would really help us isolate the issue I think. Thanks.
(In reply to Michael Meeks from comment #4) > ...The referenced patch is for import only, > not for export ... Argh, should have noticed the filename, thank you for your feedback and sorry for the noise.
The bisect results should be right. I checked it again (but I didn't reverse the commit).
(In reply to Telesto from comment #6) > The bisect results should be right. I checked it again (but I didn't reverse > the commit). Saving concerns export part whereas the quoted patch concerns import part. (unless the result of import triggers something buggy for export?) Callgrind trace (quoted by Michael) can be retrieved on Linux by following this process https://wiki.documentfoundation.org/QA/BugReport/Debug_Information#GNU.2FLinux:_How_to_get_a_callgrind_trace but it may take a lot of time to get it since you test on a large file.
I reduced the file to some lines for retrieving more quickly a callgrind trace. When following the steps, I got 7 files. -rw------- 1 julien julien 154041635 oct. 14 21:40 callgrind.out.3022 -rw------- 1 julien julien 8821918 oct. 14 20:27 callgrind.out.3024 -rw------- 1 julien julien 89670788 oct. 14 20:42 callgrind.out.3103 -rw------- 1 julien julien 89834408 oct. 14 20:42 callgrind.out.3105 -rw------- 1 julien julien 89946667 oct. 14 20:42 callgrind.out.3107 -rw------- 1 julien julien 89981772 oct. 14 20:43 callgrind.out.3109 -rw------- 1 julien julien 90270631 oct. 14 20:43 callgrind.out.3112 after tar+bzip2, I got: 76632709 oct. 15 09:49 callgrind.tar.bz2 so it's still too big to be attached. Any thoughts? Perhaps it's because I use enable-dbgutil when building (my goal is to have max traces)
Created attachment 136980 [details] screenshot of kcachegrind for callgrind.out.3022
Caolán: thought you might be interested in this one since it seems LO spends a lot of time in a specific part of vcl (but perhaps it's expected)
Hi Julien - if you create an LDAP account you should have access to our ownCloud / Nextcloud instance - used for eg. ESC meetings, and can up-load / create a public link there. It looks to me strange to have this image scaling thing on the hot path - we do call that for rendering and scaling the thumbnail we have a the spreadsheet on save - but ... 10bn pcycles is a -lot- of scaling. Can you find who is calling that ? Possibly some changes in how we handle images / areas has caused a problem here (?)
(In reply to Michael Meeks from comment #11) > Hi Julien - if you create an LDAP account you should have access to our > ownCloud / Nextcloud instance - used for eg. ESC meetings, and can up-load / > create a public link there. >... Do you have a link for the procedure to create an LDAP account? (I found nothing or missed it on the wiki)
(In reply to Michael Meeks from comment #4) > Hi Julien, thanks for the pointer. The referenced patch is for import only, > not for export ... so I guess we'd want to get a bibisect / bisect to a > certain patch. Failing that - a callgrind profile of it would really help us > isolate the issue I think. Thanks.
Created attachment 142448 [details] Callgrind output from master Arch Linux 64-bit Version: 6.2.0.0.alpha0+ Build ID: 78bed16e17f5128d3523413255d95de85c0ebf36 CPU threads: 8; OS: Linux 4.16; UI render: default; VCL: kde4; Locale: fi-FI (fi_FI.UTF-8); Calc: group threaded Built on May 31st 2018
I can't uncompress xz file tar -xJf callgrind-calc.tar.xz tar: Ignoring unknown extended header keyword 'SCHILY.fflags' xz -d callgrind-calc.tar.xz gives callgrind-calc.tar but tar xvf callgrind-calc.tar gives tar: Ignoring unknown extended header keyword 'SCHILY.fflags' Do you use bsdtar?
(In reply to Julien Nabet from comment #15) > I can't uncompress xz file > tar -xJf callgrind-calc.tar.xz > tar: Ignoring unknown extended header keyword 'SCHILY.fflags' > > xz -d callgrind-calc.tar.xz gives callgrind-calc.tar > but tar xvf callgrind-calc.tar > gives > tar: Ignoring unknown extended header keyword 'SCHILY.fflags' > > Do you use bsdtar? You can ignore the warning, check your folder: it has the file callgrind.out.31264 after you uncompress it with tar -xJf I use KDE's Ark, but I don't have BSD tar on my system. Only GNU tar. I don't know why Ark makes it like so.
(In reply to Buovjaga from comment #16) > ............... > You can ignore the warning, check your folder: it has the file > callgrind.out.31264 after you uncompress it with tar -xJf > ... Oups, you're right, I had missed the file.
(In reply to Julien Nabet from comment #17) > (In reply to Buovjaga from comment #16) > > ............... > > You can ignore the warning, check your folder: it has the file > > callgrind.out.31264 after you uncompress it with tar -xJf > > ... > Oups, you're right, I had missed the file. I gave a try to kcachegrind but except telling "cycle21" (I don't know what's this) is the most called part, I don't know how to find bottlenecks. Every doc I find on the net talks about simple cases with a main and a few functions.
Hi Julien - thanks for having a go here ! =) > I gave a try to kcachegrind but except telling "cycle21" Tip #1 for kcachegrind - turn off cycle detection ;-) the tool lies vigorously with that turned on, and the results are often ~meaningless for non-trivial cases. I'd also turn off the '%' button and use absolute cycle counts everywhere - or it is very easy to loose a sense of proportion =) I'd also check all counts vs. the total in the bottom status bar for sanity. HTH !
(In reply to Michael Meeks from comment #19) > Hi Julien - thanks for having a go here ! =) > > > I gave a try to kcachegrind but except telling "cycle21" > > Tip #1 for kcachegrind - turn off cycle detection ;-) the tool lies > vigorously with that turned on, and the results are often ~meaningless for > non-trivial cases. I'd also turn off the '%' button and use absolute cycle > counts everywhere - or it is very easy to loose a sense of proportion =) I'd > also check all counts vs. the total in the bottom status bar for sanity. > > HTH ! Thank you Michael for your feedback. I got KCacheGrind 0.8.0 (from Debian testing). I unchecked "Relative" (must correspond to %) + "Cycle detection". I don't see "absolute cycle counts" or "total", I only got file name + "Total Instruction Fetch Cost" in the bottom status bar and I can't right click on it. I sorted by "Called", the most called is rtl_uString_release (162 582 170) but I suppose it could be expected. Must I check "Relative to parent"? Must I use some Grouping?
> I unchecked "Relative" (must correspond to %) + "Cycle detection". Great =) > I don't see "absolute cycle counts" or "total", I only got file name + > "Total Instruction Fetch Cost" in the bottom status bar and I can't > right click on it. That's fine. > I sorted by "Called", the most called is rtl_uString_release (162 582 170) > but I suppose it could be expected. Right - when you look at any profile there is a ton of methods like that being called a very large number of times. I personally always use the "callee map" - I select the 'main' function on the left and then see what is going on in the callee map - you can click there to drill down. When you're getting near the bottom (don't try near the top of the stack) the 'call graph' thing can be rather useful. It's all a great way to explore who calls whom in the code =) HTH.
Created attachment 150858 [details] Large spreadsheet Let's attach it so it doesn't get lost
Created attachment 150860 [details] Perf flamegraph Decided to investigate filesave perf issues after Noel optimised some stuff. In this graph, I see large plateaus concerning zipping at the end. Arch Linux 64-bit Version: 6.3.0.0.alpha0+ Build ID: 1fee3f1da6291bfbcd75455512726215d41b3e83 CPU threads: 8; OS: Linux 5.0; UI render: default; VCL: gtk3; Locale: fi-FI (fi_FI.UTF-8); UI-Language: en-US Calc: threaded Built on 18 April 2019
The zip compression is expensive, although parallelized for a single large sheet it doesn't work so well. We have a plan to improve that in the next month or two - perhaps it'll get into 6.3 =) Nice profiling flame-graph though: love it.
in Version: 6.3.0.0.alpha1+ Build ID: e7fb44b8c86b4dce1d4c6f15752cc3bf62b14888 CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US Calc: threaded it takes real 0m30,067s user 0m35,437s sys 0m1,650s
Zip parallel compression has been implemented in gerrit#73032.
(In reply to Luboš Luňák from comment #26) > Zip parallel compression has been implemented in gerrit#73032. Saving takes only a second now. I guess some polishing patches are coming still. Arch Linux 64-bit Version: 6.4.0.0.alpha0+ Build ID: 7272b9f752cb74757d6ed504202eefccc589f804 CPU threads: 8; OS: Linux 5.1; UI render: default; VCL: gtk3; Locale: fi-FI (fi_FI.UTF-8); UI-Language: en-US Calc: threaded Built on 1 June 2019
(In reply to Buovjaga from comment #27) > Saving takes only a second now. I guess some polishing patches are coming > still. Not from me, I consider the parallel zip stuff complete. Is there still something more to do about this bugreport? It looks to me it can be closed.
(In reply to Luboš Luňák from comment #28) > (In reply to Buovjaga from comment #27) > > Saving takes only a second now. I guess some polishing patches are coming > > still. > > Not from me, I consider the parallel zip stuff complete. Is there still > something more to do about this bugreport? It looks to me it can be closed. Let's close it then. it takes 5 seconds in Version: 6.3.0.0.beta1+ Build ID: 219e128553645911685b6061f7c5ea359a4c551c CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US Calc: threaded otoh, if I convert it from command line, it takes real 0m29,096s user 0m38,720s sys 0m2,016s same time as in comment 25. @Luboš, is the parallel zip not used in headless mode ?
(In reply to Xisco Faulí from comment #29) > @Luboš, is the parallel zip not used in headless mode ? Not by design. If that happens, it should be some external factor (thread pool not working in headless mode, maybe?).
And I actually cannot confirm that conversion from the CLI would take longer. It takes about the same as in the UI as if I do "soffice --headless --convert-to ods --outdir out/ 500k_formul.ods".
Let's close this, unless somebody can show there is still a problem.