79892 – Performance regression between 3.6 and later version, triple time to do the calculations (win only)

Bug 79892 - Performance regression between 3.6 and later version, triple time to do the calculations (win only)

Summary: Performance regression between 3.6 and later version, triple time to do the c...

Status:	VERIFIED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Calc (show other bugs)
Version: (earliest affected)	4.0.0.0 beta1
Hardware:	Other Windows (All)

Importance:	medium normal
Assignee:	Aron Budea

URL:
Whiteboard:	target:5.3.0 target:5.2.4
Keywords:	perf, preBibisect, regression

Duplicates (2):	77491 86059 (view as bug list)
Depends on:
Blocks:	Function-Vlookup
	Show dependency tree / graph

Reported:	2014-06-10 18:10 UTC by m_a_riosv
Modified:	2018-02-06 15:27 UTC (History)
CC List:	10 users (show)

See Also:	115490
Crash report or crash signature:

Attachments
Sample file to test (375.96 KB, application/vnd.oasis.opendocument.spreadsheet) 2014-06-10 18:10 UTC, m_a_riosv	Details
Sample file to test (375.96 KB, application/vnd.oasis.opendocument.spreadsheet) 2014-06-11 01:19 UTC, m_a_riosv	Details
Sample file to test VLOOKUP() for numbers (1.43 MB, application/vnd.oasis.opendocument.spreadsheet) 2014-06-12 13:59 UTC, m_a_riosv	Details
Sample file to test VLOOKUP() for text (1.24 MB, application/vnd.oasis.opendocument.spreadsheet) 2014-06-12 14:02 UTC, m_a_riosv	Details
Compilation of times for hard recalc with two function and different OS (137.59 KB, application/pdf) 2014-07-06 00:45 UTC, m_a_riosv	Details
kcachegrind map ... (344.47 KB, image/png) 2014-07-09 13:27 UTC, Michael Meeks	Details
horrible prototype patch ... (3.17 KB, patch) 2014-07-09 17:23 UTC, Michael Meeks	Details
Profile with the prototype patch enabled ... (337.89 KB, image/png) 2014-07-10 08:58 UTC, Michael Meeks	Details
Sample file for test calculation times. (1.16 MB, application/vnd.oasis.opendocument.text) 2014-09-22 00:06 UTC, m_a_riosv	Details
kcachegrind images (625.79 KB, application/zip) 2015-01-27 17:41 UTC, Yousuf Philips (jay) (retired)	Details
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description m_a_riosv 2014-06-10 18:10:07 UTC

Created attachment 100836 [details]
Sample file to test

Win7x64Ultimate, Intel I3(2,13), 4GB.

With sample file, calculation time:

Version 3.6.2: ~ 18"

Versions from 4.0.6 to master: ~ 56" - 59"

The file use intensively regular expressions (SUMIF),
just with 4.0, https://wiki.documentfoundation.org/ReleaseNotes/4.0
New Regular Expression Engine ICU was adopted. 

Annoying, regular expressions is one of the more powerful tools in LibreOffice.

Comment 1 m_a_riosv 2014-06-10 20:57:12 UTC

I forgot, times are for hard recacl [Shift+Ctrl+F9]

Comment 2 Kohei Yoshida 2014-06-11 01:09:52 UTC

The sample file looks corrupted though.  Can you double-check?

Comment 3 m_a_riosv 2014-06-11 01:19:44 UTC

Created attachment 100853 [details]
Sample file to test

Attached again.

Comment 4 bfoman (inactive) 2014-06-11 19:19:41 UTC

Can you provide benchmarks for current stable branches and master?

Comment 5 m_a_riosv 2014-06-11 21:24:55 UTC

After open the sample file, times for the second hard recalc, firsts very similar:

00:00:18 - Version 3.6.7.2 (Build ID: e183d5b)
00;00:54 - Version 4.0.6.2 (Build ID: 2e2573268451a50806fcd60ae2d9fe01dd0ce24)
00:00:55 - Version: 4.1.6.2 Build ID: 40ff705089295be5be0aae9b15123f687c05b0a
00:00:56 - Version: 4.2.0.4 Build ID: 05dceb5d363845f2cf968344d7adab8dcfb2ba71
00:00:52 - Version: 4.2.3.3 Build ID: 882f8a0a489bc99a9e60c7905a60226254cb6ff0
00:00:54 - Version: 4.2.4.2 Build ID: 63150712c6d317d27ce2db16eb94c2f3d7b699f8
00:00:54 - Version: 4.2.5.1 Build ID: 881bb88abfe2992c6cede97c23e64a9885de87de
00:00:56 - Version: 4.2.6.0.0+ Build ID: 5e04f1b373a288d7804253113bb28796a67dc67f
                    Win-x86@42, Branch:libreoffice-4-2, Time: 2014-06-10_09:00:50
00:00:56 - Version: 4.3.0.0.beta2+BuildID:f817dac57c748dd156cd9bacf8368dd561de88d4
                    Win-x86@42, Branch:libreoffice-4-3, Time: 2014-06-11_08:24:10
00:01:57 - Version: 4.4.0.0.alpha0+BuildID7d10da2174eac4afd189ee2e9bfb031d315763c1
                    TB: Win-x86@39, Branch:master, Time: 2014-06-11_01:17:37
                    This one has symbols.

Comment 6 m_a_riosv 2014-06-12 13:59:59 UTC

Created attachment 100921 [details]
Sample file to test VLOOKUP() for numbers

Sample with VLOOKUP() searching for numbers.

Comment 7 m_a_riosv 2014-06-12 14:02:07 UTC

Created attachment 100922 [details]
Sample file to test VLOOKUP() for text


I have done a couple of test:

With VLOOKUP() function, searching for:
Numbers_  Text____
00:00:02  00:00:19  - Version 3.6.7.2
00:00:02  00:01:06  - Version 4.0.6.2
00:00:02  00:00:58  - Version 4.2.5.1

Doesn't matter if "Enable regular expressions in formulas" is activated or not.

Comment 8 bfoman (inactive) 2014-06-13 14:28:28 UTC

Received via email (please always comment via bug):

Hi,

done a test with Aoo (first sample file):

00:00:16 - AOo 3.4.1
00:00:16 - AOo 4.1.0

with new sample file in the bug, VLOOKUP() for text:

00:00:13 - AOo 3.4.1
00:00:13 - AOo 4.1.0

Best regards.
Miguel Ángel.

Comment 9 Pedro 2014-06-16 13:53:27 UTC

I get quite different results with an Atom 330 based nettop running XP SP3 (tested with LO 4.3 Beta2 and LO 3.6.7.2 as well as AOO 4.1.0) when using your VLOOKUP for text sample file

The loading times are around 50 seconds with LO 4.3 and 3.6.7.2 and nearly 2 minutes with AOO 4.1.0 (AOO takes ~1min to load and then 1min to "Adapt row height"...)

So, it doesn't seem to be a regression within LO... Can you (or someone in QA) run the same test on another PC with a different CPU?

Comment 10 m_a_riosv 2014-06-16 17:55:35 UTC

VLOOKUP for text sample file

WinXPsp3, AMD athlon(tm)XP 2800+, 2,07Ghz, 1,43GB Ram
00:00:28 Version: 3.6.7.2 (Build Id.: e183d5b)
00:01:08 Version: 4.2.4.2 Build Id.: 63150712c6d317d27ce2db16eb94c2f3d7b699f8

Comment 11 m_a_riosv 2014-06-20 23:39:26 UTC

VLOOKUP for text sample file

Virtual Box 4.3.12 (Win7x64Ultimate)
OpenSuse64 13.1
00:00:14 Version: 3.6.7.2 (Build ID: e183d5b)
00:00:55 Version: 4.1.6.2 Build ID: 410m0(Build:2)
00:00:54 Version: 4.3.0.1 Build ID: 9ed0c4329cf13f882dab0ee8b9ecd7b05e4aafbb

Comment 12 m_a_riosv 2014-06-23 16:06:45 UTC

(In reply to comment #9)
...
> 
> So, it doesn't seem to be a regression within LO... Can you (or someone in
> QA) run the same test on another PC with a different CPU?

Forgive me Pedro, I'm not reporting about loading times, but time to do a second hard recalc (comment#1)

Comment 13 m_a_riosv 2014-06-27 00:02:29 UTC

VLOOKUP for text sample file

Win7x64 (parallel installations)

00:00:19 Version 3.6.7.2

00:00:54 Version: 4.2.6.0.0+ (clean profile)
00:00:53 Version: 4.2.6.0.0+ (clean profile) OpenCL enable. No difference.
          Build ID: 7a5727ba35ca2cff191208625448994202a94df6
          TinderBox: Win-x86@42, Branch:libreoffice-4-2, Time: 2014-06-23_03:45:23

00:00:57 Version: 4.3.0.1.0+ (clean profile)
00:00:03 Version: 4.3.0.1.0+ (clean profile) OpenCL enable.
          Build ID: 52613b9b632721ebb5167a4d9529c242a0b907da
          TinderBox: Win-x86@42, Branch:libreoffice-4-3, Time: 2014-06-23_07:41:56

00:05:45 Version: 4.4.0.0.alpha0+ (clean profile)
00:00:03 Version: 4.4.0.0.alpha0+ (clean profile) OpenCL enable.
          Build ID: 45f18e2ff2a54c412f296818fd73111cb62c7f3d
          TinderBox: Win-x86@39, Branch:master, Time: 2014-06-24_06:47:18

From results with OpenCL enable works fine for 4.3.0.1.0+/4.4.0.0.alpha0+ with really an awesome time reduction. But not usable for me with more complex files.

Without OpenCL, the slowness in respect with 3.6.7 remains.

Comment 14 Owen Genat (retired) 2014-07-03 06:25:58 UTC

(In reply to comment #13)
> From results with OpenCL enable works fine for 4.3.0.1.0+/4.4.0.0.alpha0+
> with really an awesome time reduction.

I tested these versions under GNU/Linux:

A v3.5.7.2, 3215f89-f603614-ab984f2-7348103-1225a5b
B v4.1.6.2, 40ff705089295be5be0aae9b15123f687c05b0a
C v4.2.5.2, 61cb170a04bb1f12e77c884eab9192be736ec5f5
D v4.3.0.2, 14ed55896fdfcb93ff437b85c4f3e1923d2b1409

Hard recalc, indicates similar results to those reported. 

Attachment 100853 [details]: A ~9 sec; B ~16 sec; C ~16 sec (~16 sec); D ~19 sec (~19 sec)
Attachment 100921 [details]: A <1 sec; B ~50 sec; C <1 sec; D <1 sec
Attachment 100922 [details]: A <9 sec; B ~52 sec; C ~29 sec (<1 sec); D ~46 sec (<1 sec)

Figures in brackets are for OpenCL turned on i.e., original attachment seems unaffected. Not sure this is confirmation.

Comment 15 m_a_riosv 2014-07-06 00:45:58 UTC

Created attachment 102314 [details]
Compilation of times for hard recalc with two function and different OS

I think is more visible the performance regression from 3.6.7 with all 4.x versions with different OS.

In my point of view is a hard regression because same complex files times for hard recalc becomes really long.

Tested functions, SUMIF and VLOOKUP are both ones of the most used in spreadsheets.

Please can someone at least set up as NEW.

Comment 16 Yousuf Philips (jay) (retired) 2014-07-06 05:20:15 UTC

System : Windows 7 64-bit with an Intel Core 2 CPU @ 1.83Ghz and 2.5gb.

Loading Time [attachment 100853 [details]]:

3.3.0   - 0:05
3.6.7   - 0:05
4.0.6   - 1:07
4.2.6   - 0:13
4.3.0.2 - 0:04

Shift+Ctrl+F9 [attachment 100853 [details]]:

3.3.0   - 0:22
3.6.7   - 0:26
4.0.6   - 1:04
4.2.6   - 1:03 (with and without OpenCL)
4.3.0.2 - 1:04 (with and without OpenCL)

Shift+Ctrl+F9 [attachment 100922 [details]]:

3.6.7   - 0:30
4.0.6   - 1:45
4.2.6   - 1:26
4.2.6   - 0:01 (OpenCL)
4.3.0.2 - 1:27
4.3.0.2 - 0:01 (OpenCL)

Comment 17 Michael Meeks 2014-07-07 08:43:56 UTC

Interesting =) If we could get a callgrind trace done on a build with debugging symbols, then the slow bits will immediately jump out of that when you visualise it in kcachegrind.

It would saves some time to have that done by someone; thanks ! =)

Comment 18 Yousuf Philips (jay) (retired) 2014-07-08 00:20:42 UTC

I would be happy to supply the trace but dont know how to do it on windows. :)

Would the linux valgrind log be sufficient, as mentioned at < https://wiki.documentfoundation.org/QA/BugReport/Debug_Information#GNU.2FLinux:_How_to_get_a_Valgrind_log > or do i need to add some additional commandline arguments to it, as the valgrind manual states i need to use --tool=callgrind.

Comment 19 Michael Meeks 2014-07-08 08:16:07 UTC

Cachegrind doesn't work on Windows; so you will need Linux -and- a build with debugging symbols [ without that there is virtually no point ;-]. I added some instructions here:

https://wiki.documentfoundation.org/QA/BugReport/Debug_Information#GNU.2FLinux:_How_to_get_a_cachegrind_trace

Comment 20 Yousuf Philips (jay) (retired) 2014-07-08 19:07:09 UTC

Is strace possible on windows?

I've done the callgrind, but an hour into the Shift+Ctrl+F9 process, the pc shut off for some strange reason. So i pulled whatever had been outputted and attached it. Hope its possible to see something from it, but if not, i can run it again.

https://drive.google.com/file/d/0B6qJrVIa0SAlZHlQOUtqV2cwd1U/edit?usp=sharing

Comment 21 Michael Meeks 2014-07-08 20:12:27 UTC

strace is not possible on windows though there is a MS equivalent, however its not going to show us where big CPU related slowdowns are like cachegrind will.

> I've done the callgrind, but an hour into the Shift+Ctrl+F9 process, the pc
> shut off for some strange reason. So i pulled whatever had been outputted and
> attached it. Hope its possible to see something from it, but if not, i can
> run it again.

Awesome - of course cachegrind -really- hammers your CPU quite hard, then again on a modern machine that shoudl leave some thermal envelope from the other CPUs. Unfortunately the trace you gave me is just of a first start / setup - ie. nothing interesting =)

Any chance you can have another go - and -thanks- so much for getting that setup ?

Comment 22 Yousuf Philips (jay) (retired) 2014-07-08 21:26:54 UTC

(In reply to comment #21)
> strace is not possible on windows though there is a MS equivalent, however
> its not going to show us where big CPU related slowdowns are like cachegrind
> will.

Well if you could add the MS equivalent to the wiki, it would be great for whenever strace is needed on windows. :)

> Awesome - of course cachegrind -really- hammers your CPU quite hard, then
> again on a modern machine that shoudl leave some thermal envelope from the
> other CPUs. Unfortunately the trace you gave me is just of a first start /
> setup - ie. nothing interesting =)

Yep it hammered my 7+ year old dual core CPU at 50% the whole time and i mistakenly cancelled it once trying to copy text from the terminal using ctrl+c. =)

Well about it shutting off, it didnt completely power off, it went into standby mode because the laptop wasnt plugged into the power and when i awake it from standby, all the things there were running had closed.

> Any chance you can have another go - and -thanks- so much for getting that
> setup ?

Second go on the way. Hopefully it will finish within a day. This time around, i'm loading libreoffice with the file directly, rather than it opening the start center and then having to go to open, as that added quite some time to the process.

Comment 23 Yousuf Philips (jay) (retired) 2014-07-09 12:18:21 UTC

All went well this time. :)

https://drive.google.com/file/d/0B6qJrVIa0SAlMFNNV1doZjJ3ZXc/edit?usp=sharing

Comment 24 Michael Meeks 2014-07-09 13:27:40 UTC

Created attachment 102485 [details]
kcachegrind map ...

I suspect the openCL code is ignoring the use-regexp-in-formulas which is perhaps other reason why it is so fast =) I attach a photo of Jay's nice trace: thanks for that Jay !

Comment 25 Michael Meeks 2014-07-09 13:37:31 UTC

Hmm - it is also somewhat amusing that the only debuginfo build we have - also has debugutil turned on adding a ton of different checks that seriously distort the performance picture. ... will try to get Matus' builds up-loaded. Either way - here I guess it is the 4 million calls to ScTable::ValidQuery that are causing the grief =)

Comment 26 Yousuf Philips (jay) (retired) 2014-07-09 14:59:09 UTC

(In reply to comment #24)
> I suspect the openCL code is ignoring the use-regexp-in-formulas which is
> perhaps other reason why it is so fast =) I attach a photo of Jay's nice
> trace: thanks for that Jay !

Thanks for the photo. Reminds me of windirstat
< https://windirstat.info/images/windirstat.jpg >

(In reply to comment #25)
> Hmm - it is also somewhat amusing that the only debuginfo build we have -
> also has debugutil turned on adding a ton of different checks that seriously
> distort the performance picture. ... will try to get Matus' builds
> up-loaded.

So i'm assuming valgrind/callgrind can go faster if that is turned off, right?

Comment 27 Michael Meeks 2014-07-09 17:23:11 UTC

Created attachment 102490 [details]
horrible prototype patch ...

I attach a prototype patch that adds a single item text search cache - to avoid calling ICU endlessly for the same strings; I see a huge number of repeated back-to-back string matching ~and this (I guess) should speed that up. Whether that is in the right place or not is unclear to me - I'd love input from Eike - and the patch is unfinished; need to cleanup the ScQueryEntry struct to make this cleaner. Also - of course it is to some degree dependent on the data as to whether that is at all useful. Still - it saves 10 seconds of the 35 that it costs to calculate this document for me.

Comment 28 Michael Meeks 2014-07-10 08:58:48 UTC

Created attachment 102524 [details]
Profile with the prototype patch enabled ...

A better profile without all the debugutil stuff inside it -and- with the cache - that drops the ICU calls from 5m -> 2m over a load/hard-re-calc/exit cycle - we are still completely dominated by ICU regexp matching time.

Comment 29 Michael Meeks 2014-07-11 11:30:03 UTC

Bit of a mystery this really, many of these matches should either immediately fail, or immediately succeed (it seems to me). Why we get 61m calls to 'RegexMatcher::MatchChunkAt' from 2m calls to TextSearch::RESearchForward is a mystery to me - the matches are all ~trivial mostly comparing for an initial series of digits as a string, and then a .* on the end ;-)

Would bear digging into those MatchChunkAt's manually I suspect.

Comment 30 Michael Meeks 2014-07-11 12:31:19 UTC

Interesting; we always call 'find':

static bool lcl_findRegex( RegexMatcher * pRegexMatcher, sal_Int32 nStartPos, UErrorCode & rIcuErr )
{
    if (!pRegexMatcher->find( nStartPos, rIcuErr))

which searches all along the string for a match I believe:

http://icu-project.org/apiref/icu4c/classicu_1_1RegexMatcher.html#a50cce379741d692b8a4edc95e3145305

Which seems rather a shame for things that are starts_with regexps - but then again; quite possibly this is not a 'starts with' regexp. It is entirely possible that adding a '^' at the beginning of the match in the queries would save a ton of time here.

Comment 31 m_a_riosv 2014-09-11 21:03:15 UTC

https://bugs.freedesktop.org/attachment.cgi?id=100853

4.3.1.2 00:00:59 :'-(

Comment 32 Michael Meeks 2014-09-12 10:30:09 UTC

So - wait ... ICU got slower - but - do you want a faster sheet yourself for your own reasons ?

If so, and you want to do a regexp match on a string - you would really win by adding a '^' to match the beginning of the string if you can.

Otherwise - the matching has to be repeated all through the length of the string - and ... well that's not incredibly quick =) can you tweak the matches to be more clever there ? =)

Beyond that, I don't see us being able to make ICU regexps a ton faster anytime soon; sorry.

Comment 33 Michael Meeks 2014-09-12 20:42:18 UTC

m.a.riosv points out that we use:

 b73baa6fbdfef197608d1f69300919b9-icu4c-53_1-src.tgz

and AOO uses:

2f6ecca935948f7db92d925d88d0d078-icu4c-4_0_1-src.tgz

Its unclear to me how these versions really compare; perhaps something lies within ? It'd be interesting to see if we can bibisect between 3.6.0 and 4.0.0 to see where the slowdown struck =)

The profile shows it all in the ICU regexp for the sheet I profiled; and good of course not to confuse the issue with other different sheets / performance issues =)

Comment 34 m_a_riosv 2014-09-22 00:06:29 UTC

Created attachment 106629 [details]
Sample file for test calculation times.

Last calculation times
(Win7x64Ultimate i3-4Gb):

ApacheOO    4.1.1   calculation time:  16 seconds,
  beginning search criterion with "^"  25 seconds.
 beginning search criterion with ".*"  31 seconds.

LibreOffice 4.3.2.1 calculation time:  59 seconds,
  beginning search criterion with "^"  33 seconds,
 beginning search criterion with ".*" 201 seconds.

Replacing LibreOffice user profile with Aoo user profile doesn't change times.

Comment 35 Owen Genat (retired) 2014-09-28 02:24:31 UTC

Testing attachment 106629 [details] for hard re-calc with these options:

A - Enable regex in formula. OpenCL off (default settings).
B - Disable regex in formula. OpenCL off.
C - Enable regex in formula. OpenCL on (Internal, software interpreter).
D - Disable regex in formula. OpenCL on (Internal, software interpreter).

Under Crunchbang 11 x86_64 using v4.3.2.2 on Core i7 Q720 @ 1.60Ghz:

A ~27 sec
B ~27 sec 
C ~1 sec  
D ~1 sec

Under Crunchbang 11 x86_64 using v4.4.0.0 2014-09-25 on Core i7 Q720 @ 1.60Ghz:

A ~25 sec
B ~25 sec 
C ~1 sec  
D ~1 sec

(In reply to comment #7)
> Doesn't matter if "Enable regular expressions in formulas" is activated or
> not.

Confirmed (mainly for my own curiosity). The performance under GNU/Linux (25-29 sec) appears to be better than under Windows (59 sec), so maybe there is something different in the Windows build with ICU?

(In reply to comment #33)
> m.a.riosv points out that we use:
> 
>  b73baa6fbdfef197608d1f69300919b9-icu4c-53_1-src.tgz
> 
> and AOO uses:
> 
> 2f6ecca935948f7db92d925d88d0d078-icu4c-4_0_1-src.tgz
> 
> Its unclear to me how these versions really compare; perhaps something lies
> within ? 

Michael, is there anything someone, such as myself (moderate CLI knowledge), can assist with in terms of trying to determine differences between these two ICU versions. One is twice the size of the other, no doubt due to the expansion of Unicode from 2009-2014. Even icu/source/test/testdata/regextst.txt highlights numerous differences. Would raising the matter on the Unicode mailing list help pinpoint an area to look at?
Best wishes, Owen.

Comment 36 Owen Genat (retired) 2014-09-28 03:53:42 UTC

Other attachments tested for hard re-calc under Crunchbang 11 x86_64 on Core i7 Q720 @ 1.60Ghz with these options:

A - OpenCL off (default setting).
B - OpenCL on.

Using v4.3.2.2:

100853  100921    100922
------  ------    ------
A ~16s  ~1s       ~27s  
B ~16s  crash![1] ~1s

Using v4.4.0.0 2014-09-25:

100853  100921    100922
------  ------    ------
A ~15s  ~2s       ~25s
B ~15s  crash![1] ~1s

Given attachment 100922 [details] and attachment 106629 [details] are similar, the match in results (refer comment 35) seem reasonable.

[1] I will report this as a separate bug.

Comment 37 raal 2014-10-31 09:27:42 UTC

*** Bug 77491 has been marked as a duplicate of this bug. ***

Comment 38 m_a_riosv 2014-11-09 17:30:24 UTC

*** Bug 86059 has been marked as a duplicate of this bug. ***

Comment 39 m_a_riosv 2014-11-23 01:05:40 UTC

Could be possible as workaround or temporal solution, downgrade de icu version to the same that works with AOo, if there aren't other areas having mandatory dependencies from the actual icu version?

Comment 40 Michael Meeks 2014-11-24 11:07:10 UTC

I guess improving the spreadsheet to match at the beginning of the string is not feasible ? =) it is doing a -ton- of un-necessary work left & right in there.

I can't see us downgrading ICU really. Beyond that if someone wants to investigate that - they need to generate a matched pair of callgrind traces for the same workload (with symbols) for the older (faster) version, and the newer (slower) version - and see if indeed the problem is with ICU. If so, we then would want to write a micro-benchmark of that, and report to the ICU people and/or look at exactly what is wrong here.

Failing that, there are of course, lots of potential opportunities for exotic CPU threading in the formula interpreter; we should be able to parallelize things rather nicely - for FormulaGroups with over-complicated regexp matching in-use - but that of course is an even bigger chunk of engineering work: if someone wants to do that work they can poke me for code pointers =)

Comment 41 Yousuf Philips (jay) (retired) 2015-01-02 12:16:36 UTC

Well i went and tested 4.0.0.3 and 4.0.0 beta1 and the refresh of calculation of attachment 100853 [details] is 1 minute, so hopefully that shrinks down the search area.

Added bibisect request as Michael suggested it in comment 33.

(In reply to Michael Meeks from comment #40)
> I can't see us downgrading ICU really. Beyond that if someone wants to
> investigate that - they need to generate a matched pair of callgrind traces
> for the same workload (with symbols) for the older (faster) version, and the
> newer (slower) version - and see if indeed the problem is with ICU. If so,
> we then would want to write a micro-benchmark of that, and report to the ICU
> people and/or look at exactly what is wrong here.

As i supplied the callgrind for the newer version already, do you want me to do the same for 3.6.2 and if so, where can i find a 3.6.2 version with symbols? Would grabbing the LO 3.6 PPA ( https://launchpad.net/~libreoffice/+archive/ubuntu/libreoffice-3-6 ) and installing the dbg package be enough?

Comment 42 Yousuf Philips (jay) (retired) 2015-01-04 14:52:43 UTC

I ran the refresh tests on the same Windows 7 machine mentioned in comment 16 and here are results for LO 4.5 (2014-12-23).

attachment 100853 [details] : 0:36
attachment 100922 [details] : 1:23

So from these results, it seems that sumif() has improved as its half what it used to be but hasnt quite gotten as fast as the 3.6, while vlookup() hasnt changed at all.

@Meeks: Currently running a callgrind for the vlookup file and will give a link to the result once its finished.

Comment 43 Yousuf Philips (jay) (retired) 2015-01-05 22:24:25 UTC

I guess vlookup() is a beast as i've had my laptop running for 2 days to get this callgrind.

https://drive.google.com/file/d/0B6qJrVIa0SAlX3Jxd0ZtbUgtZXc/view?usp=sharing

Comment 44 Michael Meeks 2015-01-06 11:00:55 UTC

Thanks Jay for building that profile ! Unfortunately you have used a dbgutil build - which includes a -ton- of extremely performance sensitive code that makes the profile nearly unusable. eg.

            SAL_INFO( "i18nlangtag", "LanguageTag::registerImpl: " << nCallsSystem << " system calls");

Takes 77bn pcycles of the ~600bn pcycles for the DoHardRecalc.

You would want to have a clean build configured with --enable-symbols and no dbgutil in order to do any performance profiling. On the bright side the trace should be rather quicker to run.

Eike - it is somewhat interesting to see 18m calls to LanguageTag::registerImpl there though - is that normal ? - though I imagine they are fast without the SAL_INFO compiled into them.

I guess that comes from 18.9m calls to utl::TransliterationWrapper::loadModuleImpl - which has a good 100bn cycles that will be independent of any dbgutil (I think).

Comment 45 Michael Meeks 2015-01-06 11:10:30 UTC

I guess there is something really silly going on here - the 18.9m calls to the QueryEvaluator -should- all get directed to the shared mpTransliteration:

        if (rParam.bCaseSens)
        {
            mpTransliteration = ScGlobal::GetCaseTransliteration();
            mpCollator = ScGlobal::GetCaseCollator();
        }
        else
        {
            mpTransliteration = ScGlobal::GetpTransliteration();
            mpCollator = ScGlobal::GetCollator();
        }

which should be initialized once per process - (I guess) - as a global thing should. Instead we see 18.9m calls dropping through utl::TransliterationWrapper::loadModuleIfNeeded and into utl::TransliterationWrapper::loadModuleImpl - which is odd to say the least.

Comment 46 Michael Meeks 2015-01-06 11:37:16 UTC

Oddly, I don't see the same behavior when I debug this under 4.4 and/or master as of recently - the behavior doesn't match the profile and it takes ~20secs to calculate for me => abandoning interest. It would be good to know exactly what version & git hash is being profiled too FWIW.

Comment 47 Yousuf Philips (jay) (retired) 2015-01-09 11:35:24 UTC

Did a calgrind for the vlookup file on a symbol enabled build and the results are in the link.

https://drive.google.com/file/d/0B6qJrVIa0SAlYlJMcHhEeEJIXzA/view?usp=sharing

Comment 48 Yousuf Philips (jay) (retired) 2015-01-27 17:41:33 UTC

Created attachment 112832 [details]
kcachegrind images

Meeks was kind enough to look at it and provide the attached kcachegrind attached images and stated that "it is actually not terribly hard to fix a performance bug" as "virtually the entire cost of vlookup is waste =) and it is (as suspected) in the loadModuleIfNeeded method."

He also stated, "My hope would be that we can re-use the same  TransliterationWrapper endlessly - and do that load only one - and => save 80% of the time there. On the other hand - it is interesting that the LookupQueryWithCache method looks like the cache is not helping either."

Comment 49 m_a_riosv 2015-01-27 22:37:07 UTC

With file Vlookup_for_test there is a great difference if the option:
Menu/Tools/LibreOffice cal/Calculate/General calculation - Search criteria = and <> mus apply to whole cells.
Win7x64Ultimate.
Version: 4.4.0.3 Build ID: de093506bcdc5fafd9023ee680b8c60e3e0645d7

Using Vlookup
With option enable: 8 seconds
With option disable: 55 seconds

Using Sumif

With option enable: 33 seconds
With option disable: 2 min 24 seconds

Both functions are not included in OpenCL functions subset and disabling subset in Formula/OpenCL also doesn't improve.

Strange doesn't matter how the option is, if you fill column B with values, formulas are updated immediately.

Comment 50 Robinson Tryon (qubit) 2015-12-10 11:09:28 UTC Comment hidden (obsolete)

Migrating Whiteboard tags to Keywords: (bibisectRequest, perf)

Comment 51 Xisco Faulí 2016-09-13 14:30:30 UTC

As per today, this regression can't be bibisected as it was introduced before 4.4 branch and there's no bibisect repository for the affected branch for windows, thus change 'bibisectRequest' to 'preBibisect'.

Comment 52 Aron Budea 2016-10-14 04:55:54 UTC

http://opengrok.libreoffice.org/xref/core/unotools/source/i18n/transliterationwrapper.cxx#127

Here an additional condition should get rid of the constant release/recreation of aLanguageTag internals:
&& ((LANGUAGE_SYSTEM != nLang) || (!aLanguageTag.isSystemLocale()))

Since if nLang is LANGUAGE_SYSTEM, and aLanguageTag is the system locale (which is set like this: "mbSystemLocale      = nLanguage == LANGUAGE_SYSTEM;" ; nLanguage is updated to system language later inside language tag), the resulting language tag should be the same in the end.

I'll commit a patch with this, and a few related performance changes.

Comment 53 Commit Notification 2016-10-24 18:05:36 UTC

Aron Budea committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=2ed5057993fa8d0f57531303e6c9fafc37138737

tdf#79892: Improve performance of transliteration

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 54 Commit Notification 2016-10-25 19:30:23 UTC

Aron Budea committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=8175e30b732e3f6f4f1058934e7fe8a1189f40cf

tdf#79892: Pass resolved locale to transliterate calls in Calc

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 55 Aron Budea 2016-10-25 23:39:48 UTC

Based on Eike's feedback the performance fix turned out to be a bit different to what I described in comment 52, but has the same effect: the loadModuleIfNeeded calls should be gone.

Miguel, could you measure again with a daily build >= 10.26?

Comment 56 m_a_riosv 2016-10-26 08:03:18 UTC

I'll do it.

Comment 57 m_a_riosv 2016-10-27 22:20:49 UTC

Testing with:
Sample file for test calculation times. (1.16 MB, application/vnd.oasis.opendocument.text)
2014-09-22 00:06 UTC


Version: 5.2.3.2
Build ID: c019706a50de7fcb07d0d528b09b46a887562beb
CPU Threads: 4; OS Version: Windows 6.2; UI Render: default; 
Locale: es-ES (es_ES); Calc: CL
Search whole cells option, on: 7 s, off: 44

master~2016-10-27_06.10.21_LibreOfficeDev_5.3.0.0.alpha1_Win_x86
crash calling Menu/Help/About bug#103545
Search whole cells option, on: 10 s, off: 44

master~2016-10-27_06.10.21_LibreOfficeDev_5.3.0.0.alpha1_Win_x86 @39
crash calling Menu/Help/About bug#103545
Search whole cells option, on: 10 s, off: 44

Maybe the patch it's not for now.

Comment 58 m_a_riosv 2016-10-30 13:18:27 UTC

Tested with
Version: 5.3.0.0.alpha1+
Build ID: 0a4e0dfffd2038c5bcaef0bc20884e60dfc2080a
CPU Threads: 4; OS Version: Windows 6.19; UI Render: GL; Layout Engine: old; 
TinderBox: Win-x86@39, Branch:master, Time: 2016-10-30_00:16:37
Locale: es-ES (es_ES); Calc: CL
Search whole cells option, on: 10 s, off: 42

A couple of seconds better with option off, 5% what it's always a good improvement, but don't seems to solve the issue.

Adding a regex =VLOOKUP("^"&D1;$A$1:$B$10000;2;0)
Search whole cells option, on: 2m15s, off: 2m15s

With wildcards =VLOOKUP("*"&D1;$A$1:$B$10000;2;0)
Search whole cells option, on: 58 s, off: 58s


Version: 5.2.3.2
Build ID: c019706a50de7fcb07d0d528b09b46a887562beb
CPU Threads: 4; OS Version: Windows 6.2; UI Render: default; 
Locale: es-ES (es_ES); Calc: CL
Search whole cells option, on: 7s, off: 44s

Adding a regex =VLOOKUP("^"&D1;$A$1:$B$10000;2;0)
Search whole cells option, on: 30s, off: 30s

With wildcards =VLOOKUP("*"&D1;$A$1:$B$10000;2;0)
Search whole cells option, on: 20s, off: 20s

Comment 59 m_a_riosv 2016-11-01 14:42:39 UTC

New test with:
Version: 5.3.0.0.alpha1+
Build ID: 33090865c494618f4e528bf5a10aae8c4fc443d1
CPU Threads: 4; OS Version: Windows 6.2; UI Render: GL; Layout Engine: old; 
TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2016-10-31_22:55:53
Locale: es-ES (es_ES); Calc: CL

With regex enable without regex expression.
Search whole cells option, on: 2s, off: 12s

With regex enable and regex expression =VLOOKUP("^"&D1;$A$1:$B$10000;2;0)
Search whole cells option, on: 24s, off: 24s

With wildcards enable and wildcard expression =VLOOKUP("*"&D1;$A$1:$B$10000;2;0)
Search whole cells option, on: 16s, off: 16s

With no wildcard or regex enable.
Search whole cells option, on: 2s, off: 12s

Retesting with 3.6
With regex enable without regex expression.
Search whole cells option, on: 15s, off: 19s

With regex enable and regex expression =VLOOKUP("^"&D1;$A$1:$B$10000;2;0)
Search whole cells option, on: 27s, off: 27s

----------------------------------------------------

I guess we have now the Aron's patch merged.

Amazing time reduction without regex/wildcards about 70% less than 5.2.

Near to 20% of time reduction with regex/wildcards.

On other file with a lot of sumif() and regex (whole on) there is a time reduction from 45s to 39s about 13% less than 5.2

I don't know if can be considered right the difference with/without regex or wildcards, but in any case the regression since 3.6 it's not there.

Thanks Aron and to all, solved for me.

Could it be backported to 5.2?

Comment 60 Aron Budea 2016-11-04 09:53:21 UTC

Thank you for the testing! It's good to see the improvement is finally there (at least where this particular change had influence), I don't know what was wrong with the earlier daily builds, the patch got merged on the 24th.
Patches for 5.2 are now in gerrit.

I can at least take a look at the remaining performance issues as well. 
During my previous fix, I checked with attachment 106629 [details] and a simple hard recalc.

Do the different tests involve searching in different files?
For example, I tried searching in the same file, based on the description of the second entry:
-opened Find&Replace,
-entered "=VLOOKUP("^"&D1;$A$1:$B$10000;2;0)" in the Find field,
-checked "Regular expressions" ("Entire cells" then became unselectable),
-and did a Find All.

It completed in 1-2s, which is too quick to be correct (and found no results). What am I doing wrong?

Comment 61 Commit Notification 2016-11-04 11:38:40 UTC

Aron Budea committed a patch related to this issue.
It has been pushed to "libreoffice-5-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a29d53509b48657d53f5496d2295d30343f25ab6&h=libreoffice-5-2

tdf#79892: Pass resolved locale to transliterate calls in Calc

It will be available in 5.2.4.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 62 Commit Notification 2016-11-04 11:40:37 UTC

Aron Budea committed a patch related to this issue.
It has been pushed to "libreoffice-5-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=610be3d128fe19b2d8458a479005314045eab830&h=libreoffice-5-2

tdf#79892: Improve performance of transliteration

It will be available in 5.2.4.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 63 m_a_riosv 2016-11-04 13:27:08 UTC

(In reply to Aron Budea from comment #60)
> Thank you for the testing! .....
Thanks you for take it.
 
> Do the different tests involve searching in different files?
> For example, I tried searching in the same file, based on the description of
> the second entry:
> -opened Find&Replace,
> -entered "=VLOOKUP("^"&D1;$A$1:$B$10000;2;0)" in the Find field,
> -checked "Regular expressions" ("Entire cells" then became unselectable),
> -and did a Find All.
> 
> It completed in 1-2s, which is too quick to be correct (and found no
> results). What am I doing wrong?

Searching a string with regex, and regex enable, usually doesn't work if regular expressions are not escaped.
'\=VLOOKUP\("\^"\&D1;\$A\$1:\$B\$10000;2;0\)'
Works for me searching.

IMO this can be set up as RESOLVED FIXED, I can test on 5.2 when available to change to verified.

A better performance with regex/wildcards if it's possible, I think better on a new bug for enhancement, seems to me it's a different matter.

Comment 64 Aron Budea 2016-11-05 19:13:15 UTC

(In reply to m.a.riosv from comment #63)
> Searching a string with regex, and regex enable, usually doesn't work if
> regular expressions are not escaped.
> '\=VLOOKUP\("\^"\&D1;\$A\$1:\$B\$10000;2;0\)'
> Works for me searching.

Is that supposed to be done on the same file as well? (Vlookup_performance_SearchForText_20140912.ods)

All right, I'm closing this report as FIXED. Please open a new one for the remaining cases where performance improvements would be useful, and CC me on it.

Comment 65 m_a_riosv 2016-12-11 12:06:28 UTC

Tested with
Version: 5.2.4.1 (x64)
Build ID: 9b50003582f07ac674d6451e411e9b77cccd2b22
CPU Threads: 4; OS Version: Windows 6.19; UI Render: default; 
Locale: es-ES (es_ES); Calc: group

similar result than 5.3 (comment #59), so set up as Verified.