Bug 126573 - Add array functions in Calc: FILTER, SORT, SORTBY, UNIQUE, SEQUENCE, RANDARRAY
Summary: Add array functions in Calc: FILTER, SORT, SORTBY, UNIQUE, SEQUENCE, RANDARRAY
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: high enhancement
Assignee: Balázs Varga (allotropia)
URL: https://issues.oasis-open.org/browse/...
Whiteboard: Interoperability target:24.8.0 inRele...
Keywords:
: 127809 147224 156698 159105 (view as bug list)
Depends on: 159872
Blocks: ODF-spec Excel-Functions
  Show dependency treegraph
 
Reported: 2019-07-26 22:10 UTC by Gerry
Modified: 2024-07-10 12:44 UTC (History)
36 users (show)

See Also:
Crash report or crash signature:


Attachments
MandelBrot Set test spreadsheet (18.06 KB, application/vnd.oasis.opendocument.spreadsheet)
2024-07-01 05:27 UTC, Óvári
Details
separate sequence functions (28.75 KB, application/vnd.oasis.opendocument.spreadsheet)
2024-07-01 08:58 UTC, Balázs Varga (allotropia)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gerry 2019-07-26 22:10:04 UTC
Description:
There are some array features in spreadsheets which recently have been added to Excel. These functions make lots of sense and they remind me of statistical software (e.g. Stata) which have very similar functions:

FILTER, SORT, SORTBY, UNIQUE, SEQUENCE, RANDARRAY

https://support.office.com/en-us/article/new-array-functions-003df6c7-1dcb-4388-8e2e-0fe77a0887bc?ui=en-US&rs=en-US&ad=US

Steps to Reproduce:
.

Actual Results:
.

Expected Results:
.


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 Eike Rathke 2019-08-05 10:45:35 UTC
Note that those functions aren't fully specified yet, for example https://support.office.com/en-us/article/RANDARRAY-function-21261e55-3bec-4885-86a6-8b0a47fd4d33 says:

Note: January 14, 2019: The RANDARRAY function is one of several beta features, and currently only available to a portion of Office Insiders at this time. We'll continue to optimize these features over the next several months. When they're ready, we'll release them to all Office Insiders, and Office 365 subscribers.


Also, it needs to be investigated how the spilling situations are actually handled and what happens if due to recalculation the return array size varies. This is not just new spreadsheet functions but a new behaviour that calculating a function can change spreadsheet content.

Also restrictions like such formula can be edited only at the top left of its array need to be implemented, though we maybe could handle that differently as long as the edited formula affects the range as if it was edited at the top left. Our current array formula display/edit behaviour already is somewhat similar.
Comment 2 raal 2019-10-02 18:46:32 UTC
*** Bug 127809 has been marked as a duplicate of this bug. ***
Comment 3 Jürgen Kirsten 2020-06-14 11:51:18 UTC
I would like to support Gerry's cause.

In my opinion also the XLOOKUP function would be very helpful. It returns an array if needed.

XLOOKUP, FILTER, SORT, UNIQUE 

are my wishes.

Juergen
Comment 4 flywire 2021-06-13 10:10:04 UTC Comment hidden (me-too)
Comment 5 Philip Sch 2021-07-04 10:06:17 UTC
These are not just nice to have anymore, some or all of those functions are already used in Google Sheets for a long time and needed for intercompatability, and are not Office Insiders exclusive anymore.
Comment 7 PremPa 2021-07-28 09:59:07 UTC Comment hidden (me-too)
Comment 9 Jacob H 2022-05-11 16:59:50 UTC
I add my support for these functions.

Over the years I've come across the limitations of array handling a number of times. Most often I've found wacky workarounds with compromises on functionality and likely performance. However, there are instances where the limitations are simply too constricting making the ideal handling of arrays impossible. I'll share the example I'm currently facing later, but provide a general use case in favor of the need of these functions.

I'll start by saying that dynamic Array manipulation is critical in handling data and has so many use cases. In my opinion the dynamic handling of arrays is the only remaining missing link in both function and form, at least as far as data handling in Calc.

One general limitation is that you can't take 2 arrays and interweave the data filtered on criteria, and then sorted to be in a proper order. Libreoffice Calc has the tools to do this manually, but if the document has ongoing changes, that manual effort becomes extraordinarily time consuming and tedious.

-----------
The rest of this post will detail real world limitations
-----------

So my current example, which could be solved by these functions being implemented, goes as follows:

I have a budget/finance system which has separate sheets as ledgers for income, expenses, transfers between accounts. There are then other sheets which extrapolate as much data as possible from those ledgers allowing you to see things like totals from different income types, various expenses broken down, averages per month, detailed account information, stock information, and on and and and on. (This requires some not as wacky array workarounds, but it works.)

Now stocks are where some wacky array workarounds start. If I want to pluck the purchase of a stock out of the Expenses Sheet (and all related information), I need a named range/formula expression called, "ExpensesToStock" with: IF(ExpensesType="Stock">0,(ExpensesType="Stock")*(ROW(ExpensesType)),"") Where "ExpensesType" is the column titled Type in the Expenses Sheet. "ExpensesToStock" is now an array of row numbers where that data is in the Expenses Sheet, and blank spaces where the data isn't. Then I remove the blank spaces on a separate configuration sheet by grabbing the "Small" of those row numbers out of "ExpensesToStock" by putting this function in consecutive rows: {=IFERROR(SMALL(ExpensesToStock,ROW($A1)),"")} Each consecutive row then increments as a variable for the small. (ROW($A1),ROW($A2),...) Now that I have a list of rows where the data is, I can run an "Indirect" function to display all of the relevant data on a separate sheet called "Stocks." That's a filtered ledger of only stocks that I can then do additional calculations on to provide even more stock relevant data.

^ TL;DR, Functions automatically copy each Stock line out of the "Expenses" sheet into the "Stocks" sheet and calculate additional important information related to the Stocks.

However, here is where the limitations of Calc array handling arise. Stocks aren't just an expense. Stocks can be an expense, income, or a merging between companies. If I wanted to make a dynamic ledger to determine proper data (total stocks, total basis, etc.) I need to evaluate both the purchase, sale, and transfer from the income and expenses sheet for all stock related information. You can imagine that the stock purchase and sale date are pretty important for the order of calculating. 

Combining both the purchase and sale of stocks plucked from 2 different sheets into a single date sorted ledger would be trivial and automatic with a filter and sort array function, but currently impossible in Calc short of constant manual efforts.
Comment 10 Roman Kuznetsov 2022-05-17 15:52:37 UTC Comment hidden (obsolete)
Comment 11 Roman Kuznetsov 2022-05-17 15:53:45 UTC
(In reply to Roman Kuznetsov from comment #10)
> Please don't change the Summary. There is a different bug 127293 for XLOOKUP
> implementation
Comment 12 Mike Kaganski 2022-05-22 09:06:24 UTC
*** Bug 147224 has been marked as a duplicate of this bug. ***
Comment 13 Rafael Lima 2022-07-07 20:25:30 UTC Comment hidden (me-too)
Comment 14 Philip Sch 2023-02-11 13:26:01 UTC Comment hidden (noise)
Comment 15 Romano Giannetti 2023-02-12 11:59:34 UTC
(Hoping this is not just noise, but can help somebody)

I know it's not a complete solution, but in a hurry, this extension:

https://github.com/goosepirate/lox365

can be useful.
Comment 16 Eike Rathke 2023-08-09 16:45:47 UTC
*** Bug 156698 has been marked as a duplicate of this bug. ***
Comment 17 george 2023-11-06 17:08:43 UTC Comment hidden (noise)
Comment 18 Óvári 2023-12-10 01:37:56 UTC
(In reply to Jacob H from comment #9)
> I'll start by saying that dynamic Array manipulation is critical in handling
> data and has so many use cases. In my opinion the dynamic handling of arrays
> is the only remaining missing link in both function and form, at least as
> far as data handling in Calc.
Dynamic arrays
https://support.microsoft.com/en-us/office/dynamic-array-formulas-and-spilled-array-behavior-205c6b06-03ba-4151-89a1-87a7eb36e531
Comment 19 Mike Kaganski 2024-01-10 15:02:09 UTC
*** Bug 159105 has been marked as a duplicate of this bug. ***
Comment 20 Commit Notification 2024-03-01 17:31:57 UTC
Balazs Varga committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/b0791dd9216224bdb266fd0d8c87df253b6d0583

tdf#126573 Add Excel2021 array function FILTER to Calc

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Commit Notification 2024-03-15 11:39:10 UTC
Balazs Varga committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e5361d0ef04bc63e7809323750aabe6efafcaef9

Related: tdf#126573 Add Excel2021 array function FILTER to Calc

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 22 Commit Notification 2024-03-25 14:54:42 UTC
Balazs Varga committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/45435a680be065e44eba385bb2523b27b77fb451

tdf#126573 Add Excel2021 array function SORT to Calc

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 23 Commit Notification 2024-03-26 09:21:35 UTC
Balazs Varga committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/bfb4c58ae708c75949559290bdfdd9afcef6aa91

tdf#126573 Add Excel2021 array function SORTBY to Calc

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 24 Commit Notification 2024-04-05 14:04:28 UTC
Balazs Varga committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/be938f2fb1c0a18658b4170396a22c64b77bf646

Related tdf#126573 Add Excel2021 array function SORT to Calc

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 25 Commit Notification 2024-04-26 07:33:30 UTC
Balazs Varga committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/35772a003bb30be61f8ba8abe805455e41db0e1e

tdf#126573 Add Excel2021 array function SEQUENCE to Calc

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 26 Commit Notification 2024-05-08 15:46:10 UTC
Balazs Varga committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/d68f2394afc3372d867ea6157123e51b278ba81b

tdf#126573 Add Excel2021 array function RANDARRAY to Calc

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 27 Commit Notification 2024-05-15 16:11:30 UTC
Balazs Varga committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/c1565b8b94548ef85508b0302ef9299f18ff09be

tdf#126573 Add Excel2021 array function UNIQUE to Calc

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 29 nobu 2024-05-15 20:47:39 UTC
On behalf of everyone who has been waiting for these, thank you so much.
Comment 30 ady 2024-05-15 22:45:39 UTC
FTR, the functions were added to the Release Notes wiki page (for LO 24.8), and _then_ additional patches were applied, and several bugs were reported (and hopefully fixed). The fact that these functions are already mentioned in the Release Notes does not mean that they were all thoroughly tested by common users. OTOH, they are not considered experimental either.

So, please actually test and report.

Please note that "supporting dynamic arrays" is not the same as supporting these functions; this ticket is about the latter only.
Comment 32 Duncan Loveday 2024-05-26 14:02:20 UTC
I notice that if I enter the array formula

=UNIQUE(A1:A999)

into a range of cells that is larger than the output from the UNIQUE() (e.g. paste it into a range of 10 cells when there are only 5 unique values in the range A1:A999) then the remaining cells are filled with #N/A. Fair enough I guess but when I try the array formula

=IFNA(UNIQUE(A1:A999),"NO VALUE")

it doesn't substitute the #N/A for NO VALUE as I had hoped. Is that a bug?

Would apply to any array formula pasted into a range larger than it's result although unique() is one where the size of the input and output arrays differ so arises more often.
Comment 33 Duncan Loveday 2024-05-26 14:10:05 UTC
Another observation: If I put

=UNIQUE(A1:A999) into one cell, say B1 then an array formula is automatically created and placed into a range that is just big enough for the result. If I have five unique values in the range A1:A999 then this will be B1:B5. Then if I go and add more unique values to the range A1:A999 then the result of the UNIQUE() (which is now too big for the range B1:B5) is silently truncated.

In excel the behaviour is different - the output range dynamically grows to accomodate the larger result of the UNIQUE(). In the event that this larger range runs into something else the result is shown as #SPILL meaning "result is too large for available space". This seems a bit more robust and intuitive.

What is the intended behaviour here?
Comment 34 ady 2024-05-27 10:57:52 UTC
(In reply to Duncan Loveday from comment #33)
> Another observation: If I put
> 
> =UNIQUE(A1:A999) into one cell, say B1 then an array formula is
> automatically created and placed into a range that is just big enough for
> the result. If I have five unique values in the range A1:A999 then this will
> be B1:B5. Then if I go and add more unique values to the range A1:A999 then
> the result of the UNIQUE() (which is now too big for the range B1:B5) is
> silently truncated.

Manually selecting the range where the array formula is located, and expanding the range should provide the additional values.

ATM, recalculate (hard) will not automatically expand the result. IDK whether it should  – there are both positive and negative consequences, and Calc would need to know how to deal with them – or whether this is the expected behavior (with any array formula?).

> 
> In excel the behaviour is different - the output range dynamically grows to
> accomodate the larger result of the UNIQUE(). In the event that this larger
> range runs into something else the result is shown as #SPILL meaning "result
> is too large for available space". This seems a bit more robust and
> intuitive.

That's about dynamic array formulas, which is not the same as supporting these functions.

> 
> What is the intended behaviour here?
Comment 35 Óvári 2024-06-09 05:28:53 UTC
add FILTER function ODF proposal is here:
https://issues.oasis-open.org/browse/OFFICE-4156

add SORT function  ODF proposal is here:
https://issues.oasis-open.org/browse/OFFICE-4157

add SORTBY function ODF proposal is here:
https://issues.oasis-open.org/browse/OFFICE-4158
Comment 36 Duncan Loveday 2024-06-10 11:54:47 UTC Comment hidden (obsolete)
Comment 37 Óvári 2024-07-01 05:11:21 UTC
https://ask.libreoffice.org/t/draw-the-mandelbrot-set-in-libreoffice-calc-using-only-formulae/107400/2

Error occurs when entering (Ctrl+Shift+Enter) the following formulae in LibreOffice 24.8 Dev:

Err:539:
=ArrayFormula(UNIQUE({ MOD(SEQUENCE(1681,1,0,1),41)/20, ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20; -MOD(SEQUENCE(1681,1,0,1),41)/20, ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20; MOD(SEQUENCE(1681,1,0,1),41)/20, -ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20; -MOD(SEQUENCE(1681,1,0,1),41)/20, -ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20 }))

Err:539:
=UNIQUE({ MOD(SEQUENCE(1681,1,0,1),41)/20, ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20; -MOD(SEQUENCE(1681,1,0,1),41)/20, ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20; MOD(SEQUENCE(1681,1,0,1),41)/20, -ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20; -MOD(SEQUENCE(1681,1,0,1),41)/20, -ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20 })

Err:509:
=MOD(SEQUENCE(1681,1,0,1),41)/20, ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20, -MOD(SEQUENCE(1681,1,0,1),41)/20, ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20, MOD(SEQUENCE(1681,1,0,1),41)/20, -ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20, -MOD(SEQUENCE(1681,1,0,1),41)/20, -ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20

Can you please advise how to enable LibreOffice Calc to draw the MandelBrot Set in LibreOffice Calc using only formulae?

Thank you
Comment 38 Óvári 2024-07-01 05:27:02 UTC
Created attachment 195060 [details]
MandelBrot Set test spreadsheet
Comment 39 Balázs Varga (allotropia) 2024-07-01 08:57:02 UTC
> Error occurs when entering (Ctrl+Shift+Enter) the following formulae in
> LibreOffice 24.8 Dev: 
> Err:509:
> =MOD(SEQUENCE(1681,1,0,1),41)/20, ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20,
> -MOD(SEQUENCE(1681,1,0,1),41)/20, ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20,
> MOD(SEQUENCE(1681,1,0,1),41)/20, -ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20,
> -MOD(SEQUENCE(1681,1,0,1),41)/20, -ROUNDDOWN(SEQUENCE(1681,1,0,1)/41)/20 

Thank you for you comment, Óvári.

What would be the expected value/array here?

Is this long function should work? (For me it looks syntactically wrong.) I mean I copy-pasted it into other apps like (Excel, google sheet) and I got errors there too.

I tried the functions separately, like:
=MOD(SEQUENCE(1681;1;0;1);41)/20
=ROUNDDOWN(SEQUENCE(1681;1;0;1)/41)/20
=-MOD(SEQUENCE(1681;1;0;1);41)/20
=-ROUNDDOWN(SEQUENCE(1681;1;0;1)/41)/20

and all of these worked fine. :)

> Can you please advise how to enable LibreOffice Calc to draw the MandelBrot
> Set in LibreOffice Calc using only formulae?

Is this problem related to these new functions? If not maybe we should set back to resolved and should open another new one for this?
Comment 40 Balázs Varga (allotropia) 2024-07-01 08:58:39 UTC
Created attachment 195063 [details]
separate sequence functions
Comment 41 Regina Henschel 2024-07-01 17:10:23 UTC
Err:539 means this: In LibreOffice, an inline array can only be build from constant values. LibreOffice is not able to use expressions that need a calculation. The ODF specification is in section 5.13 Inline Array, part 4, ODF 1.3.
There you find the Note
“Note: Expression authors should be aware that use of Expression other than constant Number or constant String may impair interoperability.”

So the problem with these examples seems to be, that the values of the inline array are not constant. An enhancement request exists already, tdf#105683.

I suggest, setting the bug back to "resolved fixed".