Bug 166576 - Array Formulas with column references like B:B or named columns are very slow even for short columns
Summary: Array Formulas with column references like B:B or named columns are very slow...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
25.2.3.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: perf
Depends on:
Blocks:
 
Reported: 2025-05-14 11:09 UTC by Klaus
Modified: 2025-12-29 21:27 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Klaus 2025-05-14 11:09:29 UTC
Description:
Array formulas are one of the ways, consistency of the formula across
a range can be enforced.

When using them that way however, combined with column references,
severe performance issue occur, as the calculation will seemingly be
performed for all rows of the sheet, not just for the subset of the
sheet containing data.


Steps to Reproduce:
1. Prepare a reference data column.
1.1. Enter  =RAND()  into A1.
1.2. Select A1:A20000
1.3. Sheet > Fill Cells > Fill Down
1.4. Ctrl+C, Ctrl+Shift+V > Values Only to convert the formulas
     to fixed numbers.
1.5. Data > Define Range: Define the range "A:A" (the whole column)
     as range named "input".

2. Test the performance of regular cell formulas.
2.1. Enter  =input*input/input  into cell B1.
2.2. Select column B (the range B:B).
2.3. Ctrl+D to "Fill Down"
This should happen near instantaneously.

3. Test the performance of an array formula.
3.1. Select cell C1
3.2. Ctrl+F2
3.3. Enable the “Array” checkbox.
3.4. Type out  =input*input/input


Actual Results:
You should observe small hiccups as the “Result” field gets updated.
When typing something slightly more complicated like

    =IF(ISNUMBER(input), input, "-")

the hiccups should grow significantly more severe. 

Expected Results:
While it makes sense, that an array formula would calculate the whole
array, it would be useful to have some option for limiting the array
formula to consider only ranges, where input data exists.

This is what the ISNUMBER(input) tries to do, but only in the sense
that it doesn't produce an output number, but produces a placeholder
non-numeric string.



Reproducible: Always


User Profile Reset: Yes

Additional Info:
What I was trying to achieve was essentially:

  1. Define a whole column as named range, in order to easily check
     the name by selecting the column.

  2. Use it in a formula, that is guaranteed to be consistent for
     the whole column.

In order to improve the performance of the “array formula and column
ranges”-solution, I was asking on reddit to see if there is a way to
limit the number of rows in a sheet, since limiting the sheet to, say,
4000 rows would vastly improve the performance of the resulting
operations. See https://www.reddit.com/r/libreoffice/comments/1kf82c8


__________________________________________________
Version: 25.2.3.2 (X86_64) / LibreOffice Community
Build ID: bbb074479178df812d175f709636b368952c2ce3
CPU threads: 12; OS: Linux 6.4; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF8); UI: en-US
Flatpak
Calc: threaded
Comment 1 Eike Rathke (retired, only occasionally showing up) 2025-05-14 12:00:53 UTC
An array formula using a whole-column reference like B:B internally creates an array of the size of an entire column for that reference, i.e. 1048576 elements. This is expected to be somewhat slower..
Comment 2 Buovjaga 2025-12-25 20:54:26 UTC
(In reply to Klaus from comment #0)
> You should observe small hiccups as the “Result” field gets updated.
> When typing something slightly more complicated like
> 
>     =IF(ISNUMBER(input), input, "-")
> 
> the hiccups should grow significantly more severe. 

More universal notation:
IF(ISNUMBER(input); input; "-")

How severe is the lag for you? For me, Result is updated in less than 3 secs stopwatch time. 25.2 seemed to be slightly more laggy, but still only a bit over 3 secs. Intel® Core™ i7-6700K CPU from 2016 with 32 GB RAM.

Arch Linux 64-bit
Version: 26.8.0.0.alpha0+ (X86_64)
Build ID: 640a35801612123fa503fa72e8732cfb7e75caca
CPU threads: 8; OS: Linux 6.18; UI render: default; VCL: gtk3
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: CL threaded
Built on 23 December 2025
Comment 3 Klaus 2025-12-29 21:22:31 UTC
(In reply to Buovjaga from comment #2)
> How severe is the lag for you? For me, Result is updated in less than 3 secs
> stopwatch time. 25.2 seemed to be slightly more laggy, but still only a bit
> over 3 secs. Intel® Core™ i7-6700K CPU from 2016 with 32 GB RAM.
> 
> Arch Linux 64-bit
> Version: 26.8.0.0.alpha0+ (X86_64)
> Build ID: 640a35801612123fa503fa72e8732cfb7e75caca
> CPU threads: 8; OS: Linux 6.18; UI render: default; VCL: gtk3
> Locale: fi-FI (fi_FI.UTF-8); UI: en-US
> Calc: CL threaded
> Built on 23 December 2025

The main issue (the "3.x" part) is when using the formula editor (Ctrl+F2). Three seconds for the computation are a non-issue; Three seconds after every key-stroke make the formula editor near unusable.

The issue is likely that for array formulas the whole array is evaluated to update the "Results" text field. 

In my case more like 0.5-1.5 seconds, but even then it creates near unusable input lag.

Intel Xeon E-2136 CPU (6 cores @3.3 GHz, 4.5 GHz boost) with 32 GB RAM (2018).
Comment 4 Klaus 2025-12-29 21:27:22 UTC
(In reply to Buovjaga from comment #2)
> How severe is the lag for you? For me, Result is updated in less than 3 secs
> stopwatch time. 25.2 seemed to be slightly more laggy, but still only a bit
> over 3 secs. Intel® Core™ i7-6700K CPU from 2016 with 32 GB RAM.

(In reply to Klaus from comment #3)
> The main issue (the "3.x" part) is when using the formula editor (Ctrl+F2).
> Three seconds for the computation are a non-issue; Three seconds after every
> key-stroke make the formula editor near unusable.

The other performance issue is, that there is no good way to define a name for a column, but use only the actually non-empty parts of the column in the array formula for evaluating an array formula that defines the contents of another column.