Bug 151851 - Sorting array formula works differently in Calc compared to Excel
Summary: Sorting array formula works differently in Calc compared to Excel
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: interoperability
Keywords:
Depends on:
Blocks: Calc-Function
  Show dependency treegraph
 
Reported: 2022-11-01 07:54 UTC by Mike Kaganski
Modified: 2022-11-02 01:13 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Sorting array formula example (9.62 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2022-11-01 07:54 UTC, Mike Kaganski
Details
Formula analysis (131.17 KB, image/png)
2022-11-01 14:28 UTC, Rafael Lima
Details
Comparison of Calc and Excel (128.49 KB, image/png)
2022-11-01 14:35 UTC, Rafael Lima
Details
Formula to sort arrays of data. (5.90 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2022-11-02 01:13 UTC, m_a_riosv
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2022-11-01 07:54:36 UTC
Created attachment 183357 [details]
Sorting array formula example

In the attachment, the data range "range1" (A1:A15) is sorted in C1:C15 using array formulas like

> {=INDEX(range1;MATCH(LARGE(COUNTIF(range1;">"&range1);ROW(C1:C1));COUNTIF(range1;">"&range1);0))}

The results in Excel start with sorted cells containing data, then zeroes; in Calc, zeroes are first.

It seems to me that Calc is correct in this case (because indeed, empty cells are those that have most other cells greater than them); but this is an interop problem; and anyway, maybe I miss some detail (I always am confused by array evaluation).
Comment 1 Rafael Lima 2022-11-01 14:28:03 UTC
Created attachment 183363 [details]
Formula analysis

Hi Mike... Using the function wizard to analyze the function I came to a similar conclusion. It seems Calc is right because an empty cell is "smaller" than a cell with text.

In the sample file, cell A4 is empty hence it has 7 cells that are greater than it (which are the cells with values).

The weird bit is that in Excel, testing if an empty cell is "smaller" than a cell with value also returns TRUE (as well as in Calc). But in the array function the test returns a different result, as if an empty cell is "larger" than a cell with value.

Hence the problem is in the COUNTIF(range1;">"&range1) part, but IMO Excel is doing it wrong.
Comment 2 Rafael Lima 2022-11-01 14:35:11 UTC
Created attachment 183364 [details]
Comparison of Calc and Excel

See the attached image for more details.

I used the same array formula in both files:

=COUNTIF(A1:A15;">"&A1:A15)

The formula is applied as an array formula over the range A1:A15. Notice how the results are different.

However, note below that in both Calc/Excel an empty cell is "smaller" than a cell with text.
Comment 3 Eike Rathke 2022-11-01 17:03:16 UTC
For Criteria comparison operators empty being smaller than any content seems natural to me. However, maybe Excel does that differently because Sort (that actually sorts cells) sorts empty cells to the end of the range (which also Calc does), i.e. sort column A to see.
Comment 4 m_a_riosv 2022-11-02 01:13:03 UTC
Created attachment 183368 [details]
Formula to sort arrays of data.

Seems to me that such formula in any case is not of too much confidence.

Excel: Introducing something in the empty cells between data (a4, a8) produce a result sort list with replicated values at the end. If there are numbers, the second it's not on the sorted list.
Calc: retains the '0' at the top, but if numbers in a4,a8 only a4 it's on the sorted list but duplicated.

Attached a sample file that deals with numbers, text, blank cells (at the end), working with calc and excel. Based on 'ken johnson' answer https://forum.openoffice.org/en/forum/viewtopic.php?p=290729#p290729