Bug 154125 - INDEX() does not return a vector of an array if that is only a vector, but only the first element
Summary: INDEX() does not return a vector of an array if that is only a vector, but on...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Eike Rathke
URL:
Whiteboard: target:7.6.0 target:7.5.3
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-10 19:37 UTC by ady
Modified: 2023-07-31 11:18 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
sample file (72.50 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2023-03-13 11:56 UTC, Xisco Faulí
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ady 2023-03-10 19:37:36 UTC
I have a case for the INDEX() function in Calc that behaves differently than in Excel.

In Excel, the INDEX() function can process arrays natively, meaning that it can be introduced with [ENTER] and it doesn’t need CSE.

Apparently, this is not completely true for the INDEX() function in Calc. The specific result (either an array, or one simple value, or an error) may also be dependent on the second and third arguments too.

For an example (file), see the following tutorial about the INDEX() function in Excel, which includes a publicly available file to download, linked within the text "sample workbook".

* The file can be downloaded from the tutorial page ("sample workbook").

* The full tutorial starts at:
https://www.ablebits.com/office-addins-blog/excel-index-match-multiple-criteria-formula-examples/

* The second section of that same page of the tutorial, which explains how Excel handles the relevant "non-CSE" formula, starts at:
https://www.ablebits.com/office-addins-blog/excel-index-match-multiple-criteria-formula-examples/#non-array-multiple-criteria

* The incompatibility can be seen in the file named "index-match-multiple-criteria.xlsx" > worksheet "Non-array formula" > cell G4.

* In Excel, that formula is introduced with a simple [ENTER] and returns a valid result. In Calc, it seems as if CSE would be required (extra curly braces shown in Calc), and it returns Err:504, which is carried from the inner INDEX() function.

* The link to the file is:
https://cdn.ablebits.com/excel-tutorials-examples/index-match-multiple-criteria.xlsx


(Just for the record, I have no connection to the site nor the products whatsoever.)


CC’ing Eike Rathke
Comment 1 Eike Rathke 2023-03-10 21:55:22 UTC
This is not a matter of load/save (adjusting summary).
Also, it's not about array formula vs non-array formula. The formulas in the example document are explicitly stored as array formulas by Excel, hence Calc correctly displays them in {} curly braces. It's also not that in Calc one would have to enter an array formula for this MATCH() case to get a result if INDEX() worked correctly.

It's simply that INDEX() does not return a column or row vector of an array that is already a single column or row vector. E.g. with , comma as array-column separator and ; semicolon as array-row separator:

=INDEX({1;2};0;1)
(all rows, column 1) should return {1;2} but returns 1

=INDEX({1,2};1;0)
(row 1, all columns) should return {1,2} but returns 1

=INDEX({1;2};0;0)
(all rows, all columns) correctly returns {1;2}

=INDEX({1,2};0;0)
(all rows, all columns) correctly returns {1,2}

=INDEX({1;2};1;0)
(row 1, all columns) correctly returns 1

=INDEX({1,2};0;1)
(all rows, column 1) correctly returns 1

=INDEX({1,2;3,4};0;0)
correctly returns {0,1;2,3}

=INDEX({1,2;3,4};0;1)
correctly returns {1;3}

=INDEX({1,2;3,4};1;0)
correctly returns {1,2}
Comment 2 Eike Rathke 2023-03-11 17:53:43 UTC
Note that in Excel the cases

=INDEX({1;2};0;0)
=INDEX({1,2};0;0)
=INDEX({1,2;3,4};0;0)

with both row_num=0 and column_num=0 that they document to not work at all and return error. There is no compelling reason though why it shouldn't and it works in Calc.

Additionally there is a working case that they do not document, if the array is a one-dimensional row vector then if column_num is not given then the row_num argument acts as a column index. E.g.

=INDEX({1,2};2)
returns 2, as does
=INDEX({1,2};0;2)

That is specified by ODFF v1.3 in 6.14.6 INDEX
https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part4-formula/OpenDocument-v1.3-os-part4-formula.html#INDEX
Comment 3 ady 2023-03-11 21:30:24 UTC
Using LO 7.4.5, here is what I have tested ("|" is column separator, ";" as both row and argument separators).

In a pre-dynamic formula context, with INDEX() in array form *by itself*:


A_
{ =INDEX({11|12;21|22};0;1) } (with CSE)
=11 (1st row, 1st col)
=21 (2nd row, 1st col)

OK.


B_
=INDEX({11|12;21|22};0;1) (_no_ CSE)
=11 (1st row only)

OK.


C_
{ =INDEX({11;21};0;1) } (with CSE)
=11 (1st row)

NOT OK. We should also get:
=21 (2nd row)

D_
=INDEX({11;21};0;1) (_no_ CSE)
=11 (1st row)

OK, but expected considering result "C" above.


E_
{ =INDEX({11|12};1;0) } (with CSE)
=11 (1st col)
NOT OK. We should also get:
=12 (2nd col)

F_
=INDEX({11|12};1;0) (_no_ CSE)
=11 (1st col)

OK, but expected considering result "E" above.


Considering that those are wrong results already, I have not tested additional possibilities such as default values when arguments are omitted, or out-of-scope arguments, or wrong-type arguments.


Please note that these are examples of INDEX() by itself. When INDEX() is used as lookup_array for MATCH() (as in the example in comment 0), then CSE should not be required in order to obtain the array to be fed for MATCH().

Once the above examples get to work, I should be able to test them in combination with MATCH() as in the example in comment 0.

PS: FWIW, in Excel I believe that =INDEX({1,2};2) should had been #REF!, IIUC (whether with CSE or not).
Comment 4 Eike Rathke 2023-03-11 22:45:17 UTC
You are complicating things and just repeating what I gave..

=INDEX({11;21};0;1)
CSE is exactly the column vector case I said is not working as it should.

=INDEX({11|12};1;0)
CSE the same but for row vector.

Fwiw, you can see in the Function Wizard what actually is returned for the examples; whether the formula is entered as CSE-array or normal only affects how the final result is displayed, either as full array result (CSE) or just the single top left element's value (normal).


> in Excel I believe that =INDEX({1,2};2) should had been #REF!

But it isn't, is it?
Comment 5 Eike Rathke 2023-03-11 23:06:00 UTC
@Mike:
Does full (non-online) Excel do vector replication in INDEX()? i.e. with array-row separator ; semicolon does

=INDEX({1;2};0;2)

return error, or does it return column vector {1;2}? (transform separators to whatever is correct for your Excel..)

Similar, does

=INDEX({1;2};2;2)

return error, or 2?
Comment 6 ady 2023-03-12 01:13:13 UTC
(In reply to Eike Rathke from comment #4)
> > in Excel I believe that =INDEX({1,2};2) should had been #REF!
> 
> But it isn't, is it?

Quote:
row_num and column_num must point to a cell within array; otherwise, INDEX returns a #REF! error.

Source:
https://support.microsoft.com/en-us/office/index-function-a5dcf0dd-996d-40a4-a822-b56b061328bd

I cannot test in Excel ATM. Hopefully Mike can. IDK whether a pre-dynamic formula era version of Excel would provide a different result than the current versions in the case of out-of-scope arguments; it shouldn't. I am more concern about Calc providing the expected results in the other (normal) cases, but I do understand the desire to improve the function all around.
Comment 7 Eike Rathke 2023-03-12 02:06:33 UTC
(In reply to ady from comment #6)
> (In reply to Eike Rathke from comment #4)
> > > in Excel I believe that =INDEX({1,2};2) should had been #REF!
> > 
> > But it isn't, is it?
> 
> Quote:
> row_num and column_num must point to a cell within array; otherwise, INDEX
> returns a #REF! error.
In comment 2 I wrote that they did not document it, geez..

There was a .xls file with exactly that behaviour, and it's also implemented by Gnumeric.
https://bz.apache.org/ooo/attachment.cgi?id=34659
Comment 8 Mike Kaganski 2023-03-12 06:08:14 UTC
(In reply to Eike Rathke from comment #5)
> @Mike:
> Does full (non-online) Excel do vector replication in INDEX()? i.e. with
> array-row separator ; semicolon

Since I don't really know how to find the respective syntax in Excel, I did the following: I copy-pasted the formula below to a Calc; and saved as XLSX. Then I opened the XLSX in Excel 2016; on my system, it turned out to show as "=INDEX({1\2};0;2)"

> does
> =INDEX({1;2};0;2)
> return error, or does it return column vector {1;2}? (transform separators
> to whatever is correct for your Excel..)

So, =INDEX({1\2};0;2) gives me 2 in Excel 2016 (I made sure it's not a cached result, entering this formula anew). Since it's not an array formula, I am not sure how to disambiguate possible vector results (and values shown from them by intersection) from scalars?

> Similar, does
> 
> =INDEX({1;2};2;2)
> 
> return error, or 2?

=INDEX({1\2};2;2) gives me #REF!
Comment 9 Mike Kaganski 2023-03-12 06:16:59 UTC
(In reply to Eike Rathke from comment #5)
> with array-row separator ; semicolon

Oh, sorry. I didn't check my Calc settings in the first place; in my case, the ; is the column separator, and | is the row separator. Now let me re-check.

So formula "=INDEX({1|2};0;2)" in my Calc arrived as "INDEX({1;2};0;2)" in Excel. It gave me #REF!.
Formula "=INDEX({1;2};2;2)" in Excel also gave #REF!.
Comment 10 Eike Rathke 2023-03-12 12:37:39 UTC
Thanks Mike, so vector replication is not in effect for both, a vector's element and the entire vector.
Comment 11 Commit Notification 2023-03-12 16:04:45 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/df706f47a2b62248d222911db12c674e6507e5c6

Resolves: tdf#154125 Fix INDEX() one-dimensional vector access

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Eike Rathke 2023-03-12 16:06:36 UTC
Pending review https://gerrit.libreoffice.org/c/core/+/148755 for 7-5
Comment 13 Commit Notification 2023-03-13 09:17:21 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/9820f754f1a8a26568d4d848072fe8bb5f6c04b1

Resolves: tdf#154125 Fix INDEX() one-dimensional vector access

It will be available in 7.5.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 ady 2023-03-13 10:37:33 UTC
Thank you Eike Rathke.

Some nitpicks, if I may.

(In reply to Eike Rathke from comment #7)
> > Quote:
> > row_num and column_num must point to a cell within array; otherwise, INDEX
> > returns a #REF! error.
> In comment 2 I wrote that they did not document it, geez..

Sincerely, IDK what you meant with that. I simply quoted part of a text that I thought was relevant, and posted the source of what I thought was supposed to be how MS uses/defines/handles the INDEX() function. That was part of their page. If there is another document that is the real source of such info (and incidentally the quoted text was not part of that), I am not aware of it.

Now, with LO 7.6alpha from today, when using out-of-scope arguments such as
=INDEX({1;2};2;2)

...I would had expected to see the same #REF! error as Excel, considering that ODF uses it for the same purpose (IIUC). Instead, I get Err:502 (Invalid argument). Having said that, older LO versions provide the same Err:502 result too.

If error codes are supposed to provide users with helpful clues as to where the problem might reside, I would be more confused with Err:502 than with #REF! for this case. Perhaps I am mixing or misunderstanding the respective meanings of Err:502 and #REF! in LO Calc.

Regarding the file linked in comment 0 that triggered this report...

I can confirm it is now generating the same numeric result as Excel in the (originally) problematic cell, with the difference of still showing the extra curly braces. I am not saying this is wrong from LO Calc's perspective; I am just describing what I see as user.

I can still introduce the same formula in a different cell and the numeric result will still be the same (correct now). I can introduce it in another cell with [ENTER] alone (in which case there are no extra curly braces) or as an array with CSE. IDK which version of Excel generated the original file; perhaps pre-dynamic formula era versions of Excel also show the curly braces in cell G4 whereas newer versions don't(?).

Finally, IIUC, there seems to be a mistake (typo) in the definition of INDEX() in ODF. Could some feedback be provided (from here, from TDF, from...) so it can be corrected, at least for future revisions of the standard?

Again, thank you Eike and Mike.
Comment 15 Xisco Faulí 2023-03-13 11:56:52 UTC
Created attachment 185926 [details]
sample file
Comment 16 Commit Notification 2023-03-13 16:06:41 UTC
Xisco Fauli committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/25ae332bbafd0cb416a6a5c0db467e501661c679

tdf#154125: sc_uicalc: Add unittest

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Eike Rathke 2023-03-14 14:58:00 UTC
(In reply to ady from comment #14)
> Finally, IIUC, there seems to be a mistake (typo) in the definition of
> INDEX() in ODF. Could some feedback be provided (from here, from TDF,
> from...) so it can be corrected, at least for future revisions of the
> standard?
Done.
https://lists.oasis-open.org/archives/office-comment/202303/msg00003.html
Comment 18 ady 2023-07-30 16:37:55 UTC
(In reply to Eike Rathke from comment #17)
> (In reply to ady from comment #14)
> > Finally, IIUC, there seems to be a mistake (typo) in the definition of
> > INDEX() in ODF. Could some feedback be provided (from here, from TDF,
> > from...) so it can be corrected, at least for future revisions of the
> > standard?
> Done.
> https://lists.oasis-open.org/archives/office-comment/202303/msg00003.html

FTR...

That email suggested a correction (typo-like) to the ODF standard.

ATM, the suggestion to modify the ODF standard according to that email, ended with the complete _removal_ of the problematic paragraph, instead of its modification. That is according to:

https://github.com/oasis-tcs/odf-tc/tree/master/docs/odf1.4/part4-formula

as of 2023-04-27. Perhaps there is a related issue tracker number with more info (and/or reasoning), but I don't know it.

IDK whether the suggested modification (in that email) is better than removing the paragraph entirely. IDK whether removing the paragraph in the ODF standard is adequate. I just wanted to leave this comment for future reference, just in case, and to warn about this matter, in case someone has anything to say, on time, before the next ODF version gets published (whenever that might happen).
Comment 19 ady 2023-07-31 02:57:38 UTC
From:
* https://lists.oasis-open.org/archives/office-comment/202303/msg00003.html

From the original ODF 1.3:
"
If DataSource is a one-dimensional column vector, Column is optional 
or can be omitted as an empty parameter (two consecutive ;; 
semicolons). If DataSource is a one-dimensional row vector, Row is 
optional, which effectively makes Row act as the column offset into the 
vector, or can be omitted as an empty parameter (two consecutive ;; 
semicolons).
"

It was suggested (by Eike) to change it to:
"
If DataSource is a one-dimensional column vector, Column is optional 
or can be omitted as an empty parameter (two consecutive ;; 
semicolons). If DataSource is a one-dimensional row vector, Column is 
optional, which effectively makes Row act as the column offset into the 
vector, or can be omitted as an empty parameter (two consecutive ;; 
semicolons).
"

The difference is in one word, changing only one of the "Row" to one "Column".


I cannot be sure, but I think the suggestion was misinterpreted, considering the similarities of the different paragraphs / sentences.


From the Issue Tracker 4143 (and 4144):
* https://issues.oasis-open.org/browse/OFFICE-4143

* https://lists.oasis-open.org/archives/office/202303/msg00015.html

Quote:
"
*Patrick:*  Andreas - not clear what 3rd paragraph is trying to say or it is saying the same thing as the second paragraph.

*Patrick:*  Andreas - third paragraph is wrong but the solution is to get rid of it.

*Patrick:*  Regina - agrees we can delete third paragraph

*Patrick:*  Create issue for formula - 6.14.6 Index, Eike reporting, delete third paragraph entirely

*Patrick:*  Regina - Eike's suggestion isn't correct, so just delete third paragraph

*Patrick:*  Svante agrees

*Patrick:*  Patrick to: Create issue for formula - 6.14.6 Index, Eike reporting, delete third paragraph entirely

*Patrick:*  by consent
"

This ended with the complete deletion of the quoted paragraph.
Comment 20 Eike Rathke 2023-07-31 09:49:27 UTC
That third paragraph is of explanatory nature; the function's syntax is tricky, it indeed tried to (wrongly) explain the second paragraph, if they think the paragraph is not needed, shrug..
Comment 21 ady 2023-07-31 11:18:17 UTC
(In reply to Eike Rathke from comment #20)
> That third paragraph is of explanatory nature; the function's syntax is
> tricky, it indeed tried to (wrongly) explain the second paragraph

The 3rd paragraph _was_ wrong; but it is not clear to me why Regina thinks that the suggested correction is/was also wrong (so, instead, the paragraph was deleted).


@Regina,

If the suggested correction was wrong, then what would had made this "explanatory" paragraph correct?

If the suggested correction was wrong, we (would) have to wonder whether the INDEX() function is now correct (in LO). Will there be interoperability problems (at some point)?