Bug 140365 - VLOOKUP slow on accessing search matrix in other files
Summary: VLOOKUP slow on accessing search matrix in other files
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.0.1.2 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: perf
Depends on:
Blocks: Function-Vlookup
  Show dependency treegraph
 
Reported: 2021-02-12 07:47 UTC by Hagarius1988
Modified: 2022-02-16 14:41 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
File for Case1 (1.26 MB, application/vnd.ms-excel)
2021-02-12 09:17 UTC, Hagarius1988
Details
File for Case2 - File 1 (1.02 MB, application/vnd.ms-excel)
2021-02-12 09:18 UTC, Hagarius1988
Details
File for Case2 - File 2 (275.71 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2021-02-12 09:19 UTC, Hagarius1988
Details
Perf flamegraph (328.84 KB, image/svg+xml)
2022-02-16 14:38 UTC, Buovjaga
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hagarius1988 2021-02-12 07:47:33 UTC
Description:
VLOOKUP slow on accessing matrix table in other files

We use Libreoffice here in our company on about 300 machines. First of all: A big thanks to the community and the developers.

Some users have reported to us internally that the CALC is very slow in the function VLOOKUP. We have found out internally that VLOOKUP reacts fast within the spreadsheets, but not when VLOOKUP is executed over files.

Example:
Case 1:
1500 rows, VLOOKUP in 4 columns over 900 rows.
VLOOKUP is executed within File 1 and only accesses another spreadsheet.
Duration ~1 second

Case 2:
File 1: 1500 lines, VLOOKUP in 4 columns over 900 lines
VLOOKUP is executed over 2 files. Search matrix is located in File 2.
File 2: corresponds exactly to the 2nd spreadsheet from File 1.
Duration ~30 seconds

We assume that with VLOOKUP a file hook is created for each row.
Destination File - open-call -> VLOOKUP -> Destination File close-call
In the above Case 2, this would thus generate 3600 hooks. This could lead to this delay.

Is it possible for you to optimize the VLOOKUP function for cross-file searches? Maybe only 1 file hook and close the hook first after finishing the whole calculation.

Best thanks to you all of you.

Steps to Reproduce:
1. See description

Actual Results:
Case 1: Duration ~1 second
Case 2: Duration ~30 seconds

Expected Results:
Case 2: Duration enhancement nearer to case 1


Reproducible: Always


User Profile Reset: Yes



Additional Info:
Can be reproduced in other Versions too.
Release 6.4.7.2
Release 6.4.3.2
Comment 1 Xisco Faulí 2021-02-12 08:49:09 UTC Comment hidden (obsolete)
Comment 2 Hagarius1988 2021-02-12 09:17:46 UTC
Created attachment 169702 [details]
File for Case1

File for Case1
VLOOKUP from main sheet with search matrix in sheet "deutsch".
Comment 3 Hagarius1988 2021-02-12 09:18:36 UTC
Created attachment 169703 [details]
File for Case2 - File 1

File for Case2 - File 1
Main spreadsheet with VLOOKUP.
Comment 4 Hagarius1988 2021-02-12 09:19:38 UTC
Created attachment 169704 [details]
File for Case2 - File 2

File for Case2 - File 2
Spreadshet with data search matrix for VLOOKUP from Case2-File 1
Comment 5 Hagarius1988 2021-02-12 09:22:45 UTC
Files added.
If you make any change in the formula in column 'N', 'O', 'P' and 'Q' and then copy down the formula for the entire column you can reproduce the duration differences between case 1 and case 2.
Comment 6 m_a_riosv 2021-02-12 12:31:17 UTC
If I remember there is a difference with the option:
Menu/Tools/Options/LibreOffice calc/Calculate - Search criteria= and <> must apply to whole cells.
With it unmarked vlookup should be slower.
Comment 7 Hagarius1988 2021-02-12 13:59:41 UTC
(In reply to m.a.riosv from comment #6)
> If I remember there is a difference with the option:
> Menu/Tools/Options/LibreOffice calc/Calculate - Search criteria= and <> must
> apply to whole cells.
> With it unmarked vlookup should be slower.

"Search criteria= and <> must apply to whole cells" is checked.
Comment 8 QA Administrators 2021-02-13 04:05:20 UTC Comment hidden (obsolete)
Comment 9 Buovjaga 2022-02-16 14:38:30 UTC
Created attachment 178309 [details]
Perf flamegraph

Tested by opening case2-file1.xls, editing links and pointing to case2-file2.xlsx. In N171, I modified the formula, so the first argument was H170. I filled down to N1223.

It took about 9 secs stopwatch time.

Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: 224369b351884e8315368cfc99620c42eb97ee15
CPU threads: 8; OS: Linux 5.16; UI render: default; VCL: kf5 (cairo+xcb)
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: threaded