Bug 161767 - Improve handling of label row of database ranges for SORT, SORTBY and FILTER functions
Summary: Improve handling of label row of database ranges for SORT, SORTBY and FILTER ...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
24.8.0.0 alpha1+
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Cell-Formula Calc-DataRange
  Show dependency treegraph
 
Reported: 2024-06-24 19:55 UTC by Regina Henschel
Modified: 2024-10-04 17:19 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Example with SORT and FILTER function (26.96 KB, application/vnd.oasis.opendocument.spreadsheet)
2024-06-24 19:55 UTC, Regina Henschel
Details
Example with table in Excel (11.57 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2024-06-24 19:55 UTC, Regina Henschel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Regina Henschel 2024-06-24 19:55:05 UTC
Created attachment 194934 [details]
Example with SORT and FILTER function

The new functions SORT and SORTBY can use a database range as source array. If the database range has labels, then this label row is sorted as if it is a data row. That should not happen.

Assume a database range ´myData´ with a column label ´Sales´. Then a function call
´=FILTER(myData; myData[Sales]<3000)´ is not possible because ´myData´ has one row more than ´myData[Sales]´.

Excel has "format as table" as concept similar to database ranges. Using such array in the SORT function in Excel does not include the label row in the target array. There is an enhancement request for including it in https://feedbackportal.microsoft.com/feedback/idea/2a55004c-1db0-ee11-92bd-6045bd7fe601

Currently you need to add the label row manually in Excel. If "Members" is the table name, you can use the formula ´=VSTACK(Members[#Headers];SORT(Members[#Data];2))´, for example.

If there is no [] restriction, the reference ´Members´ means ´Members[#Data]´ in Excel. Formula ´=SORT(Members[#All];2)´ in Excel would result in the same target array with sort in label row as current implementation in LibreOffice.

I see these ways to solve the problem:
(A)
Add an option like ´[#Data]´ to the database range reference to exclude the label row from the target array, and if not used, keep the label row and position it at the beginning.

(B)
Let the function always work on the array without labels and add an additional parameter "add labels at top" to the functions, to add the source labels at top of the target array after the data is sorted/filtered.

(C)
Let the function always work on the array without labels and add VSTACK and HSTACK functions and a database range reference restriction like ´[#headers]´ to allow the user a solution like ´=VSTACK(Members[#Headers];SORT(Members;2))´

Any other ideas?

Keeping the label row _always_ at the beginning is no solution, because then a combination like SORT(FILTER(...)) would not be possible because the result of the inner FILTER function is a database range without name and thus the whole array including the label row would be input to the outer SORT function.

The ODF TC has just started discussing the SORT, SORTBY and FILTER functions. A solution that LibreOffice chooses could therefore be considered in ODF.
Comment 1 Regina Henschel 2024-06-24 19:55:44 UTC
Created attachment 194935 [details]
Example with table in Excel
Comment 2 Stéphane Guillou (stragu) 2024-07-10 12:44:53 UTC
Balázs, any opinion?
Comment 3 Regina Henschel 2024-07-18 13:05:11 UTC
LibreOffice has implemented the needed database-range keywords [[#Data]], [[#Headers]], [[#Totals]] and [[#All]]. They work besides bug 162087 and could be used in Import/Export with xlsx-Format. These keywords are not available in ODF. The request to extend ODF is in bug 162088.
Comment 4 Regina Henschel 2024-07-18 18:38:09 UTC
The import from Excel generates simple brackets, where it should be double brackets (bug 162093). So a solution that treats Excel's expression SORT(myData) same as Excel's expression SORT(myData[#Data]) will fail if this is not fixed.