Bug 122419 - AutoFilter Search Freezes Calc if the column contains many unique values
Summary: AutoFilter Search Freezes Calc if the column contains many unique values
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.4.7.2 release
Hardware: All All
: high major
Assignee: Caolán McNamara
URL:
Whiteboard: target:7.1.0
Keywords: perf
: 102927 105406 124080 128072 (view as bug list)
Depends on:
Blocks: AutoFilter
  Show dependency treegraph
 
Reported: 2019-01-02 11:58 UTC by Kevin Suo
Modified: 2020-11-24 13:34 UTC (History)
13 users (show)

See Also:
Crash report or crash signature:


Attachments
Test ODS file (145.00 KB, application/vnd.oasis.opendocument.spreadsheet)
2019-01-02 11:58 UTC, Kevin Suo
Details
Another simple test document (69.22 KB, application/vnd.oasis.opendocument.spreadsheet)
2020-11-08 07:34 UTC, Kevin Suo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Suo 2019-01-02 11:58:30 UTC
Created attachment 147932 [details]
Test ODS file

There are 41670 records in the attached spreadsheet. Searching in the auto filter search box makes Calc freezes.

Steps to Reproduce:
1. Click the autofilter drop down list in column A.
2. Type in some text (e.g., "2") in the autofilter search box to search.

Current Result:
Calc not responding.
You may wait for many hours to see whether Calc will become alive again.

Expected Result:
Calc shows the search results within a reasonable time.

This kind of job works perfect in Excel. The data comes from real world daily work.


版本:6.1.4.2
Build ID:9d0f32d1f0b509096fd65e0d4bec26ddd1938fd3
CPU 线程:4; 操作系统:Linux 4.15; UI 渲染:默认; VCL: gtk2; 
区域语言:zh-CN (zh_CN.UTF-8); Calc: group threaded
Ubuntu 18.04.1 LTS x64.
Comment 1 Oliver Brinzing 2019-01-02 12:30:28 UTC
reproducible with

Version: 6.1.4.2 (x64)
Build ID: 9d0f32d1f0b509096fd65e0d4bec26ddd1938fd3
CPU threads: 4; OS: Windows 10.0; UI render: default; 
Locale: de-DE (de_DE); Calc: group

and with

Version: 4.4.7.2
Build-ID: f3153a8b245191196a4b6b9abd1d0da16eead600
Gebietsschema: de_DE

but *not* reproducible with:
Version 3.6.7.2 (Build ID: e183d5b)
Comment 2 Oliver Brinzing 2019-10-10 17:13:52 UTC
*** Bug 128072 has been marked as a duplicate of this bug. ***
Comment 3 Telesto 2020-05-23 18:54:48 UTC
The autofilter drop down has no search option in 4.4.7.2; whats the alternative?
Comment 4 Buovjaga 2020-06-14 15:47:37 UTC
It only takes some seconds now. I think Noel fixed this in bug 76481. Plz re-test

Version: 7.1.0.0.alpha0+ (x64)
Build ID: 7f6d7a0eb624d67421cd5af6462ee2a662fdff3a
CPU threads: 4; OS: Windows 10.0 Build 18362; UI render: default; VCL: win
Locale: fi-FI (fi_FI); UI: en-US
Calc: threaded
Comment 5 Kevin Suo 2020-06-16 11:43:05 UTC
(In reply to Buovjaga from comment #4)

No, none of those two commits fixed this issue, it still takes a long time to filter using the test document in this bug on master as of today.


In addition, when testing this bug, I find the following commit seems to have caused a regression: before this commit I am able to open the dropdown of the test document, but with this commit when I click on the drop list Calc simply hangs:

2020-06-12 20:03:54 +0200 a16e6122dc62f545df90b9ea4d1f4723c46336b6
author	Caolán McNamara
weld checklistmenu

I will report a new bug for this issue instead.
Comment 6 Kevin Suo 2020-11-08 07:22:07 UTC
*** Bug 102927 has been marked as a duplicate of this bug. ***
Comment 7 Kevin Suo 2020-11-08 07:23:54 UTC
*** Bug 105406 has been marked as a duplicate of this bug. ***
Comment 8 Kevin Suo 2020-11-08 07:25:35 UTC
*** Bug 124080 has been marked as a duplicate of this bug. ***
Comment 9 Kevin Suo 2020-11-08 07:34:21 UTC
Created attachment 167089 [details]
Another simple test document

To reproduce, type "1" in the autofilter search box in the 1st column.
Comment 10 Kevin Suo 2020-11-08 07:40:23 UTC
I don't think this is a regression or bibisectable - As Telesto said in comment 3, there was no autofilter search box at least in version 4.4.7.2. 

Please correct me if I am wrong. If it is really a regression, then it's worth bibisecting.

In the meantime I am raising the IMPORTANCE to HIGH MAJOR as this impacts core functioning of Calc which affects many people.
Comment 11 Kevin Suo 2020-11-08 07:44:03 UTC
The freeze only happen of the column contains a lot of unique values. It does not freeze if the column only contains a few unique values, no matter how many rows are there in the column.

For instance, the 2nd test document (attachment 167089 [details]) contains 10000 unique values in the 1st column which do cause the freeze when search. In comparison, the 2nd column only contains 2 unique values, and no freeze in this column.
Comment 12 Giuliano 2020-11-12 13:25:41 UTC
OS: Ubuntu 20.04.1 LTS - 5.4.0-53-generic
LO version: LibreOffice 7.0.3.1 00(Build:1) – VLC – GTK3

Using a pricelist of 40k rows, I’m having the same problems of Evin Suo.

May I found the ROOT of the PROBLEM.


Take for example the test.ods file  (40k rows)

IF you try to type into the autofilter with your keyboard the string “AJVR-5.6/5UV”, you will wait for minutes… :-((

BUT if you copy and PASTE it into the autofilter the result i ISTANTANEOUS !!

This seems to show that the problem relies on how the routine listen to text entered into the filed, and not the sorting routine itself.


Hope that this will help in finding how to fix the problem !!

In the meanwhile it could be used as a workaround for the bug,


BR
Comment 13 Xisco Faulí 2020-11-12 13:56:41 UTC
Not a regression. The search option was added in https://cgit.freedesktop.org/libreoffice/core/commit/?id=f340278df616921c663075be692264ca37d86d84.
Reproducible since then
Comment 14 Xisco Faulí 2020-11-12 14:23:06 UTC
(In reply to Xisco Faulí from comment #13)
> Not a regression. The search option was added in
> https://cgit.freedesktop.org/libreoffice/core/commit/
> ?id=f340278df616921c663075be692264ca37d86d84.
> Reproducible since then

@Dennis Francis, I thought you might be interested in this issue
Comment 15 Kevin Suo 2020-11-12 14:32:03 UTC
(In reply to Giuliano from comment #12)
> BUT if you copy and PASTE it into the autofilter the result i ISTANTANEOUS !!
That's because there match result set has only one unique value.

> This seems to show that the problem relies on how the routine listen to text entered into the filed, and not the sorting routine itself.
Whatever, when we type into the search box, each time we type a char the search result should be updated. The update may be be instant, maybe 1s delay would be good so that the search actually starts when we finish typing, not wasting time with each key input.
Comment 16 Xisco Faulí 2020-11-12 14:50:31 UTC
this seems more like a UI perf problem.
@Caolán, I thought you might be interested
Comment 17 Kevin Suo 2020-11-12 16:09:48 UTC
My feeling is that there are 2 areas which makes this slow:

1. The search/sort logic may be slow and thus return the result with a delay. This can be seen if we type in one single char in the search box.

2. The type-and-see logic, without setting a wait time before the input is accepted as search string. For instance, if I want to search for records containing string "abc", I would type these three chars one by one, so if the program does not wait for (1s?), then it will try to search each time the key is pressed, which is of cause a waste of time and CPU. Another choice is to add a "OK" button and only start search if the button or ENTER is pressed.
Comment 18 Xisco Faulí 2020-11-12 16:11:51 UTC
yep, only searching after pressing ENTER could be an option
Comment 19 Caolán McNamara 2020-11-12 16:34:23 UTC
https://gerrit.libreoffice.org/c/core/+/105765 make a big difference for me for this non-hierarchical case
Comment 20 Xisco Faulí 2020-11-12 20:23:45 UTC
(In reply to Caolán McNamara from comment #19)
> https://gerrit.libreoffice.org/c/core/+/105765 make a big difference for me
> for this non-hierarchical case

with the patch included, it's intant
Comment 21 Kevin Suo 2020-11-13 06:09:38 UTC
Caolán McNamara: Please wait, do not commit that patch. It seems to have caused a regression.

Steps to Reproduce:

1. Open attachment 167089 [details].
2. Type "111" in column A search box, click OK.
3. Then type "222" in column A search box.
--> The searched result in step 3 is grey.

I guess it's grey because in the first searched result there is no records containing string "222".
Comment 22 Kevin Suo 2020-11-13 06:14:50 UTC
(In reply to Kevin Suo from comment #21)

> I guess it's grey because in the first searched result there is no records containing string "222".

The expected result is that, when we type in the search box, it starts to search from the initial state, not from the result we filtered in the next step.

For example,
a
1
2
3
4

If we first type in "1" and click OK, then the filtered result contains 1. Then, if we type in "2" (without clear the filter), we should get "2" in the list, and if then click "OK", the row "2" should be shown.
Comment 23 Caolán McNamara 2020-11-14 14:39:40 UTC
updated https://gerrit.libreoffice.org/c/core/+/105765
Comment 24 Kevin Suo 2020-11-15 01:20:02 UTC
(In reply to Caolán McNamara from comment #23)

Still a small regression.

When a simple column like this:
a
1
2
3

Without this patch:
1. Type "1" in search box, then click OK, I get "1" in rows. This is good.
2. Then type "2" in search box, then click OK, I get "2" rows. This is good and expected.

However, with this patch:
1. Type "1" in search box, then click OK, I get "1" in rows. This is good.
2. Then type "2" in search box. 
At this moment the "2" in search result is not selected which is bad. I Have manually select "2" before I can correctly get "2" in the rows. This is easy for a few records, but is too difficult if there are  a lot of search result.
Comment 25 Kevin Suo 2020-11-15 11:56:55 UTC
By the way, @Caolán: 

Your patch seems to only fix the slowness when type a char in the search box. It does not fix the slowness when type a char in the search box *and then click "OK"* (i.e., the actual filter process).

Steps to reproduce:
1. Type "1" in column A of attachment 167089 [details].
2. Click OK.

Expected Result: 
It should be fast for the filtered results to show up.

Current Result:
Very slow. For me, type "1" then click OK would make the system freeze.

Do I need to report separate bug for this issue?
Comment 26 Caolán McNamara 2020-11-15 20:04:25 UTC
updated https://gerrit.libreoffice.org/c/core/+/105765 for comment #24

wrt comment #25 that then seems to be the filtering calc itself does, i.e. not part of the .ui code so is out of my area of knowledge. So, yeah a separate bug for that because it doesn't seem related to the ui of the dropdown filter itself.

It looks to me that it loops over every cell in the column and compares each cell to every selected option in the ui list by looping over every one of those. My guess is that for this specific case where we want to see if each cell has a match in the list that we should do something like special case the search by lowercasing all the entries in the list and putting into a sorted_vector so we can look up directly if the lowercase text in each cell has a matching entry in that list without looping over every entry.
Comment 27 Kevin Suo 2020-11-16 00:36:45 UTC
Separate bug is reported in bug 138248.
Comment 28 Commit Notification 2020-11-16 08:50:46 UTC
Caolán McNamara committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/f57b158701fb082ad9016819a2af7cdb13e6fa30

tdf#122419 optimize autofilter search when there are no dates

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 29 Kevin Suo 2020-11-16 09:13:59 UTC
Caolán: Thanks for the fix. Any plan to backport this to 7.0 branch, as this is an important performance improvement which makes Calc autofilter useful again?
Comment 30 Caolán McNamara 2020-11-16 09:28:52 UTC
Its not a simple backport unfortunately, there has been a lot of changes between 7.0 and 7.1 for this area
Comment 31 Giuliano 2020-11-24 13:34:18 UTC
(In reply to Kevin Suo from comment #25)
> By the way, @Caolán: 
> 
> Your patch seems to only fix the slowness when type a char in the search
> box. It does not fix the slowness when type a char in the search box *and
> then click "OK"* (i.e., the actual filter process).
> 
> Steps to reproduce:
> 1. Type "1" in column A of attachment 167089 [details].
> 2. Click OK.
> 
> Expected Result: 
> It should be fast for the filtered results to show up.
> 
> Current Result:
> Very slow. For me, type "1" then click OK would make the system freeze.
> 
> Do I need to report separate bug for this issue?

Well,  when you type "1" in the filter, the drop-down dialog of the filter instantaneously shows the correct result list. So it seems to me that we could assume that a filtering process has been performed, because a (Correct) result it's shown.

It's is difficult to understand WHY when we click ok, the update of the rows of the spreadsheet takes so long, considering that the filtering result has been already obtained, and the programs "knows" which rows to show and to hide...