Bug 108347 - Searching a large dataset is slower as it has been before
Summary: Searching a large dataset is slower as it has been before
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.2.0.4 release
Hardware: All All
: medium normal
Assignee: Luboš Luňák
URL:
Whiteboard: target:6.3.0
Keywords: bibisected, perf, regression
Depends on:
Blocks: Find-Search multi_type_vector-regressions
  Show dependency treegraph
 
Reported: 2017-06-05 19:54 UTC by Telesto
Modified: 2020-04-15 14:35 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
bibisect in 42max (4.96 KB, text/plain)
2017-06-17 01:02 UTC, Terrence Enger
Details
Another sample might be related to this bug, search "1634005027" takes ~1min while MS Excel completes almost instantly (1.29 MB, application/vnd.oasis.opendocument.spreadsheet)
2017-08-11 09:37 UTC, V字龍(Vdragon)
Details
The same test file as attached by Vdragon but in xls format (4.69 MB, application/vnd.ms-excel)
2017-08-12 13:19 UTC, Franklin Weng
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Telesto 2017-06-05 19:54:43 UTC
Description:
Searching a large dataset is slower as it has been before. I expect it to have the same root cause as bug 108298 (but not 100% sure)

Steps to Reproduce:
1. Open attachment 132016 [details] (bug 106646)
2. Press CTRL+F and search for a random term (for example: zzzz). Take notice of the time required

Actual Results:  
Search the data set takes more than 100 seconds

Expected Results:
Should go in quite a breeze; 6 seconds or so (with LibO4.1.0.4)


Reproducible: Always

User Profile Reset: No

Additional Info:
Found in:
>100 seconds with Version: 5.5.0.0.alpha0+
Build ID: ec79f3453471ee9b6ae32e71ff16ea99d9b7751c
CPU threads: 4; OS: Windows 6.19; UI render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2017-05-28_23:21:44
Locale: nl-NL (nl_NL); Calc: CL

55 seconds with: Versie: 4.4.6.3 
Build ID: e8938fd3328e95dcf59dd64e7facd2c7d67c704d
Locale: nl_NL

32 seconds with Versie: 4.2.0.4 
Build ID: 05dceb5d363845f2cf968344d7adab8dcfb2ba71

but not in:
Versie: 4.1.0.4 (6 seconds)
Build ID: 89ea49ddacd9aa532507cbf852f2bb22b1ace28


User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Comment 1 Xavier Van Wijmeersch 2017-06-06 15:48:09 UTC
I confirm 
i used the attachment for the test
finding one word it was fast, but finding all the same word, it took 20 min


Version: 5.5.0.0.alpha0+
Build ID: ddf8539d97ce044b7df8d51d6ec72ec864b40fb8
CPU threads: 2; OS: Linux 4.9; UI render: default; VCL: kde4; 
TinderBox: Linux-rpm_deb-x86@71-TDF, Branch:master, Time: 2017-06-04_22:00:43
Locale: nl-BE (en_US.UTF-8); Calc: group

and also with 5.4.0 time 2017-06-04
os Slackware14.2 current x86
Comment 2 Terrence Enger 2017-06-17 01:02:08 UTC
Created attachment 134074 [details]
bibisect in 42max

Working on debian-stretch in the bibisect-42max repository, I localize
introduction of the slowness to commit 8e7bade4 source-hash-4c99a427,
which represents 42 commits to master.  Fast searches took 5 to 7
seconds, one slow search to 13 minutes,

This is the same range which introduced bug 106646.  I am not
concluding that they are duplicates, but a fix to either one is a good
reason to test the other again.

I am removing keyword bibisectRequest and adding bibisected, and
setting see-also 106646.
Comment 3 V字龍(Vdragon) 2017-08-11 09:37:30 UTC
Created attachment 135449 [details]
Another sample might be related to this bug, search "1634005027"  takes ~1min while MS Excel completes almost instantly
Comment 4 Franklin Weng 2017-08-12 13:19:22 UTC
Created attachment 135486 [details]
The same test file as attached by Vdragon but in xls format

I tested this file in kubuntu 16.04 running on VirtualBox.  I downloaded different version, click to row 1, and use ctrl-F to search the string "2531680038" which is in the last row.  

In version 4.1.6.2 it spent 1 second to find, and in version 4.2.0.0beta1 it took 1'37".

It should be a regression in 4.2.0.0.
Comment 5 Telesto 2017-08-15 13:29:43 UTC
@Xisco
Is a bisect possible?
Comment 7 Franklin Weng 2018-01-03 07:01:16 UTC
Using the attached test files,


In LibreOffice 5.4.4 it took 1'09" to find the term.

版本:5.4.4.2
組建 ID:1:5.4.4~rc2-0ubuntu0.16.04.1~lo1
CPU 執行緒:4; OS:Linux 4.10; UI 算繪:預設; VCL:gtk2; 
語言地區:zh-TW (zh_TW.UTF-8); Calc: group


In LibreOffice 6.1alpha (git master) it took 9" to find the term.

Version: 6.1.0.0.alpha0+
Build ID: a0e136d2cbb3784ddfcbddcfed5d784c8e4c9a64
CPU threads: 4; OS: Linux 4.10; UI render: default; VCL: gtk3; 
Locale: zh-TW (zh_TW.UTF-8); Calc: group threaded
Comment 8 QA Administrators 2019-01-04 03:40:00 UTC Comment hidden (obsolete)
Comment 9 Commit Notification 2019-05-14 20:07:34 UTC
Luboš Luňák committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/fce7c123203c91f62b45447f45e1d1f1b45d5b48%5E%21

cache cell positions when searching in calc (tdf#108347)

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Xisco Faulí 2019-05-15 14:24:18 UTC
it takes 23 seconds in

Version: 6.3.0.0.alpha1+
Build ID: a3e649c3384d19a5ad540c3d65d5f79b66fd9090
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

while it takes 295 seconds before the commit

@Luboš Luňák, thanks for fixing this issue!!