Bug 139444 - VLOOKUP function very slow on Large Tables in Calc
Summary: VLOOKUP function very slow on Large Tables in Calc
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.2 all versions
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.3.0 inReleaseNotes:7.3
Keywords: perf
: 92456 133704 144931 (view as bug list)
Depends on:
Blocks: Paste Function-Vlookup
  Show dependency treegraph
 
Reported: 2021-01-06 09:14 UTC by Baland Rabayah
Modified: 2022-05-11 10:08 UTC (History)
11 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
File which LibreOffice is struggling to calculate the VLOOKUP formula (13.28 MB, application/vnd.oasis.opendocument.spreadsheet)
2021-01-06 09:20 UTC, Baland Rabayah
Details
Report from macOS after LO killing (33.78 KB, text/plain)
2021-11-09 17:11 UTC, Roman Kuznetsov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Baland Rabayah 2021-01-06 09:14:07 UTC
Description:
Hello all,

Hope you are all doing well

I would like to file a bug, or rather some performance issue with Libreoffice Calc which happens on both 6.4.7 and 7.01.1. I have a rather large table which I am using a VLOOKUP formula to find data, which the VLOOKUP formula also has a CONCATENATE formula as well. With the formula, when pasting across 1024 columns and 90 rows, Libreoffice simply does not calculate the data and hangs up. 

I've tested this on three machines, one Windows and two using KDE Neon. The windows Machine used LibreOffice 7.01.01, while the KDE Neon machines used 6.4.7. I tested the same sheet (in ODS format) on Excel where Excel was able to calculate the formulas near instantaneously, and I also tested on gnumeric, where gnumeric took 5 to 8 seconds to calculate the full sheet.

I also tried this in safe mode to ensure it was not a user profile issue. Other further notes which I should mention:

 I've attempted to use openCL to try and speed up the calculations.
 I also ran libreoffice on terminal to see if it was throwing any error messages, none appeared.
 File size is approximately 15MB

I am a big fan of LibreOffice and was somewhat surprised by the performance issue stated above, as usually Calc is much quicker than Excel! It would be great if we can get to the bottom of this to see what may be the issue, unfortunately I do not have any strong form of Software diagnostic skills, and realistically can only contribute by filing bugs and financially supporting the project.

Steps to Reproduce:
1.open a large file which has a table of data that requires to use a CONCATENATE formula to combine two columns of data (Date and Stock ticker).

2. Use a VLOOKUP function in another sheet with stock tickers across in columns, and dates in rows where the VLOOKUP formula will have a CONCATENATE formula as well (example =VLOOKUP(CONCATENATE(date,stock ticker),table range,false) to search up the CONCATENATED variable in the table in another sheet

3. Copy the formula from one column towards the last possible column.

Actual Results:
Results are that LibreOffice freeezes and does not calculate the results of the VLOOKUP.

Expected Results:
Expected Libreoffice to take at most, 1 minute to calculate given the large amount of data and slight complexity of the formula.


Reproducible: Always


User Profile Reset: Yes



Additional Info:
Software should be quick at calculating the VLOOKUP formula, as other software was able to do it fine. It may be by chance due to specific hardware I'm using, but I prefer to file this as a bug/performance issue
Comment 1 Baland Rabayah 2021-01-06 09:20:13 UTC
Created attachment 168721 [details]
File which LibreOffice is struggling to calculate the VLOOKUP formula

Hi All, I've attached the file where Libreoffice is struggling to calculate the VLOOKUP formula. To test, in the 2nd sheet, please paste the first column of formulas across the sheet below the tickers which are on the first row. Hope this helps
Comment 2 Roman Kuznetsov 2021-01-06 19:09:44 UTC
confirm in

Version: 7.2.0.0.alpha0+ (x64)
Build ID: 8e691505d4675b878b30bd00cd2e4fb4f794f0ef
CPU threads: 4; OS: Windows 10.0 Build 18363; UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: threaded

and in 5.2.7.2
Comment 3 Roman Kuznetsov 2021-01-06 19:17:05 UTC
OMG, MS Excel does the same copy-paste just momently >_<
Comment 4 m.a.riosv 2021-01-06 23:35:58 UTC
For me about a minute with O365 on the earth.

There are older bugs about vlookup performance

https://bugs.documentfoundation.org/buglist.cgi?quicksearch=vlookup&list_id=1247981
Comment 5 Roman Kuznetsov 2021-03-03 18:48:25 UTC
(In reply to Roman Kuznetsov from comment #3)
> OMG, MS Excel does the same copy-paste just momently >_<

and Gnumeric too!
Comment 6 Luboš Luňák 2021-11-08 07:46:20 UTC
Either the instructions in comment #1 are unclear, or I cannot reproduce with current master.
Comment 7 Roman Kuznetsov 2021-11-08 08:22:39 UTC
(In reply to Luboš Luňák from comment #6)
> Either the instructions in comment #1 are unclear, or I cannot reproduce
> with current master.

1. Open attached file in Calc
2. Select Lookup sheet
3. Select D2:B90 cell range and copy it
4. Select C2:AMJ2 cell range and press Ctrl+V => LO hangs
Comment 8 Commit Notification 2021-11-08 19:34:54 UTC
Luboš Luňák committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/ec0edb0969c23b25576f4d1b3b2ee5d3f21990ad

optimize VLOOKUP by returning SharedString if possible (tdf#139444)

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Luboš Luňák 2021-11-08 19:44:50 UTC
*** Bug 92456 has been marked as a duplicate of this bug. ***
Comment 10 Commit Notification 2021-11-08 20:39:37 UTC
Luboš Luňák committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/5e9c2677e8fcd19b289d947b94ceba52b138728b

improve performance of cell equality comparisons (tdf#139444)

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Elmar 2021-11-09 10:59:03 UTC
Tested 06:00 CAT 9Nov
Version: 7.3.0.0.alpha1+ / LibreOffice Community
Build ID: c7500945fc5d5bd2130a2d38be0bd4b15445cd90
CPU threads: 8; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-GB (en_GB.UTF-8); UI: en-US
Calc: threaded

██ 05:38:49 $ copy, LO "freezes"
██ 05:51:59 $ LO unfreezes
>13 minutes

During this time was able to use
Version: 6.4.7.2
Build ID: 1:6.4.7-0ubuntu0.20.04.1
CPU threads: 8; OS: Linux 5.4; UI render: default; VCL: gtk3; 
Locale: en-GB (en_GB.UTF-8); UI-Language: en-GB
Calc: threaded

Environment:
Dell E5590 32G memory
CPU:       Topology: Quad Core model: Intel Core i5-8350U bits: 64 type: MT MCP arch: Kaby Lake 
           rev: A L2 cache: 6144 KiB 
System:    Kernel: 5.4.0-89-generic x86_64 bits: 64 compiler: gcc v: 9.3.0 Desktop: Cinnamon 5.0.6 
           wm: muffin dm: LightDM Distro: Linux Mint 20.2 Uma base: Ubuntu 20.04 focal 

Downloaded LO
Tested:   CET 10Nov

Will test again after downloading the latest update (tomorrow?)
Comment 12 Roman Kuznetsov 2021-11-09 17:10:14 UTC
I killed the LO's process after 10 minutes 

Version: 7.3.0.0.alpha1+ / LibreOffice Community
Build ID: ccf3d9140a24d767003c08cd0247e6ee6fd12fe0
CPU threads: 4; OS: Mac OS X 10.16; UI render: default; VCL: osx
Locale: ru-RU (ru_RU.UTF-8); UI: en-US
Calc: threaded
Comment 13 Roman Kuznetsov 2021-11-09 17:11:17 UTC
Created attachment 176168 [details]
Report from macOS after LO killing
Comment 14 Elmar 2021-11-10 05:52:41 UTC
Took 6 minutes after LO update (50% improvement)

07:14:53 $ 
07:20:43 $ 

07:24:34 $ 
07:30:51 $

Is it possible to change the formula to use a another function that might be quicker?
e.g. something like
https://www.linkedin.com/pulse/ten-alternative-excel-methods-vlookup-anders-liu-lindberg

Version: 7.3.0.0.alpha1+ / LibreOffice Community
Build ID: d4c56284813c1e38a2fb46e9a52c5350d1ba3cef
CPU threads: 8; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-GB (en_GB.UTF-8); UI: en-US
Calc: threaded
Comment 15 Kevin Suo 2021-11-10 06:02:07 UTC
Set back to NEW. @Elmar you should not change the Status as this was already CONFIRMED.
Comment 16 Luboš Luňák 2021-11-22 20:22:01 UTC
Note that the document has Tools->Options->LibreOffice Calc->"Seach criteria = and <> must apply to whole cells" disabled, which means the lookup has to use slower substring matching.
Comment 17 Stéphane Guillou (stragu) 2022-01-23 12:24:00 UTC
Still extremely slow with LO 7.3.0.2 on Ubuntu 18.04.6:it took 5:55 minutes to paste the formulas of column B into the blank range C:AMJ

Version: 7.3.0.2 / LibreOffice Community
Build ID: f1c9017ac60ecca268da7b1cf147b10e244b9b21
CPU threads: 8; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded
Comment 18 Luboš Luňák 2022-05-11 10:05:31 UTC
https://git.libreoffice.org/core/+/122e676ce35b34c289cc4c91bb72e25398dc9e12%5E%21 and surrounding commits (and comment #16).
Comment 19 Luboš Luňák 2022-05-11 10:06:38 UTC
*** Bug 144931 has been marked as a duplicate of this bug. ***
Comment 20 Luboš Luňák 2022-05-11 10:08:03 UTC
*** Bug 133704 has been marked as a duplicate of this bug. ***