Bug 155551 - Trim function making index/match or lookup functions calculate all spreadsheet rows instead of data interval
Summary: Trim function making index/match or lookup functions calculate all spreadshee...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Calc-Function Performance
  Show dependency treegraph
 
Reported: 2023-05-29 21:09 UTC by Gisseh
Modified: 2023-06-03 19:02 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
example file (22.20 KB, application/vnd.oasis.opendocument.spreadsheet)
2023-05-29 21:11 UTC, Gisseh
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gisseh 2023-05-29 21:09:41 UTC
Description:
Dear all,

I’d like to report a particular “buggy” behavior from LO Calc that I do not see on other spreadsheet programs, such MS Excel, WPS, Google sheets.

When we use a text funcion such a match/index or even vlookup with an “open interval” (e.g. B:B instead of $B$2:$B$129$), and I copy it to multiple lines, LO Calc seems to look through the “infinite” lines and therefore becoming unresponsive, and the calculation really takes very long to be completed.

In my daily basis I use the match/index function to make stock / sales / product information sheets to speak to each other, and this LO behavior is quite annoying as to achieve a good performance I have to fixate all the intervals, where I’m used to just inform the column label on other spreadsheets software.

I am using the attached spreadsheet as an example (this is a real example from my job).

I prepared two example columns, where we can copy the formula from D2 to D129 on the ledger tab, and this would just work with good performance (it calculates within a second).

If we do the same from B2 to B129, it will also calculate correctly, however it takes around a minute on my laptop, and the Operationa System thinks the software has crashed.

Am I missing something here, or this is something we could report as a bug / improvement to be done?

Thank you all for your time.

The best,

PS:
Version: 7.5.3.2 (X86_64) / LibreOffice Community
Build ID: 9f56dff12ba03b9acd7730a5a481eea045e468f3
CPU threads: 8; OS: Linux 6.2; UI render: default; VCL: gtk3
Locale: pt-BR (pt_BR.UTF-8); UI: pt-BR
Flatpak
Calc: threaded

I uploaded an example file at: https://ask.libreoffice.org/t/open-interval-vs-fixed-interval-for-index-match-or-vlookup/91992

Steps to Reproduce:
1. Create a with single entries containing spaces after it and one attribute to it, e.g. Reference "A   ", description X
2. In another tab, spread some different references, A, B, C etc
3. Use a match/index to search descriptions such X as a Match of trimmed A etc

Actual Results:
If we use a trim function and do not fix the lookup interval, CALC seems to search on the entire number of lines, taking too much time and sometimes crashing

Expected Results:
Work with equivalent performance as with the "trim" function


Reproducible: Always


User Profile Reset: No

Additional Info:
I uploaded an example file at: https://ask.libreoffice.org/t/open-interval-vs-fixed-interval-for-index-match-or-vlookup/91992
Comment 1 Gisseh 2023-05-29 21:11:12 UTC
Created attachment 187596 [details]
example file
Comment 2 m_a_riosv 2023-05-30 00:14:54 UTC
Looks that TRIM() doesn't shortcut for reference to the whole column A:A.
I think many text functions e.g., SUBSTITUTE has the same issue.

Version: 7.6.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: 20873f073ae4a9478f0a84355f779a2176bd2ec8
CPU threads: 16; OS: Windows 10.0 Build 22621; UI render: Skia/Raster; VCL: win
Locale: es-ES (es_ES); UI: en-US
Calc: CL threaded
Comment 3 m_a_riosv 2023-05-30 00:47:20 UTC
Please change the title accordantly.
Comment 4 ady 2023-05-30 00:55:16 UTC
(In reply to m.a.riosv from comment #2)
> Looks that TRIM() doesn't shortcut for reference to the whole column A:A.
> I think many text functions e.g., SUBSTITUTE has the same issue.

Having TRIM() operating at least 123*128 = 15744 times is not the same as operating 123 times only.

Here is what I mean:
1. In attachment 187596 [details] from comment 1, worksheet "Pivot_stock_1", cell C2: =TRIM(A2) and copy down until C124.

2. In worksheet ledger, cell B2: =INDEX($Pivot_stock_1.B:B;MATCH(A2;$Pivot_stock_1.C:C;0))

3. In worksheet ledger, cell D2: =INDEX(Pivot_stock_1.$B$2:$B$124;MATCH(A2;Pivot_stock_1.$C$2:$C$124;0))

4. Copy B2:E2 down until B129:E129.


With that, TRIM() will be operating 123 times (only), while the INDEX() and MATCH() functions will operate the same amount of times as before.

In my system, this change generates a much faster response, which might suggest that the problem might not be "Open interval vs “fixed” interval for index/match or lookup function" (as the current title/subject suggests), but rather that the TRIM() function (operating on strings) is taking most of the resources.

If that is correct (which needs to be corroborated by developers), then the subject of this report (at the time I am writing this) is not adequate/relevant.
Comment 5 Eike Rathke 2023-05-30 08:40:27 UTC
The problem here is that functions like MATCH() and [HV]LOOKUP() (have to) force its range parameters into array mode, which for functions like TRIM() that do not handle cell ranges (i.e. expect a single scalar value as argument) means having to iterate over its given cell range to produce an array that will be passed to MATCH(). For a cell range like in TRIM(B:B) that is iterating over a million rows to generate a million elements array..
Comment 6 Gisseh 2023-05-30 11:11:35 UTC
I agree, and have changed the title from "Open interval vs “fixed” interval for index/match or lookup function" to "Trim function making index/match or lookup functions calculate all spreadsheet rows instead of data interval"