Bug 153297 - A complex formula does not work in Calc 7.4.x; (MATCH function)
Summary: A complex formula does not work in Calc 7.4.x; (MATCH function)
Status: CLOSED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.4.5.1 release
Hardware: x86-64 (AMD64) All
: medium critical
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected
: 153298 (view as bug list)
Depends on:
Blocks: Calc-Function
  Show dependency treegraph
 
Reported: 2023-01-31 21:32 UTC by Yurij
Modified: 2023-02-06 12:44 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample file (1.05 MB, application/vnd.oasis.opendocument.spreadsheet)
2023-02-02 23:04 UTC, m_a_riosv
Details
bug tdf 153297 demo MATCH problem in LO74x (17.10 KB, application/vnd.oasis.opendocument.spreadsheet)
2023-02-05 09:00 UTC, ady
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yurij 2023-01-31 21:32:04 UTC
Description:
In the Calc table of LibreOffice 7.3.7 and earlier, everything worked. In version 7.4.0 - 7.4.5, in the automatically filled sheet "DOWN" after any change of parameters in the sheets "LIST" ... "Ser", entries disappear.

Steps to Reproduce:
1.The error cannot be described, because it appears in a specific table.
But I don't see how to send it to you.
2.
3.

Actual Results:
When changing any parameter in the "LIST" ... "Ser" sheets, the data in the "DOWN" sheet disappears.

Expected Results:
Data should not disappear


Reproducible: Always


User Profile Reset: No

Additional Info:
The error occurs only in version 7.4.x. When returning to Calc 7.3.7 and earlier, the defect is absent, and then the damaged table can be restored by recalculating the fields of the "DOWN" sheet.

The sample table is ready for you. Its size is slightly larger than 4 Kb. Where should you send it?
Comment 1 m_a_riosv 2023-01-31 21:42:04 UTC
*** Bug 153298 has been marked as a duplicate of this bug. ***
Comment 2 m_a_riosv 2023-01-31 21:43:39 UTC
The only way, it's with a sample file to verify the issue, please attach it.
Comment 3 Stéphane Guillou (stragu) 2023-02-01 09:22:43 UTC
Yurij, there is a "Add and attachment" link in this page. Please use that to attach your example document.
Comment 4 m_a_riosv 2023-02-02 23:04:56 UTC
Created attachment 185075 [details]
Sample file

Attached a sample reduced to the behavior of MATCH() function. What's seems the issue with the reported sample in 
Where seems Type=1 or dafault behaves different from in 7.3

Regression with
Version: 7.4.0.3 (x86) / LibreOffice Community
Build ID: f85e47c08ddd19c015c0114a68350214f7066f5a
CPU threads: 4; OS: Windows 10.0 Build 19045; UI render: Skia/Vulkan; VCL: win
Locale: en-US (es_ES); UI: en-US Calc: CL
from
Version: 7.3.7.2 (x64) / LibreOffice Community
Build ID: e114eadc50a9ff8d8c8a0567d6da8f454beeb84f
CPU threads: 4; OS: Windows 10.0 Build 19045; UI render: Skia/Vulkan; VCL: win
Locale: es-ES (es_ES); UI: en-US Calc: CL

There were some fixed bugs lately in relation with MATCH, but I don't find any of them in relation.
Comment 5 m_a_riosv 2023-02-02 23:08:25 UTC
Sample file from reporter in duplicate bug https://bugs.documentfoundation.org/show_bug.cgi?id=153297

The issue in that file is in DOWN.A1 a formula with MATCH in it.
Comment 6 ady 2023-02-04 19:46:40 UTC
It has not been so easy for me to understand exactly where the problem is, so I am about to post some kind of summary, up to this point (or at least I'll try), FWIW.

First, there is a duplicate of this bug report, already marked as such, bug 153298.

There is one original attachment 185057 [details] posted in bug 153298. That file weights around 4.35MB.

Then m.a.riosv posted a different file, attachment 185075 [details], which weights around 1.05MB. 

According to OP (Yurij), using LO 7.4.x, when some data in some worksheet (named "LIST", and up to worksheet "Ser") is modified in some way, some of the formulas in another worksheet named "DOWN" result in what looks like "empty" cells (or something similar to that). This problem is not present in version 7.3.7 of LO.

What neither Yurij nor m.a.riosv were able to explain (or I could not understand them, sorry) in simple steps (up to this point) is, which exact cells you modify in order to get that unexpected result, and which modification you made. For example, something like:

_ "go to worksheet "LIST", to cell A1 and change the content to "xwy" (without quotation marks)".

Another possible example would be: "go to worksheet "LIST", to cell AA10 and delete its content.

If someone could provide a clear step-by-step procedure, indicating file name, worksheet (tab) name, cell address, and which specific change was made in that cell, I don't know how to try to replicate whatever issue results in worksheet "DOWN" (or in whatever sheet and cell the problem is ultimately seen).

Could someone please be so kind and provide such clear step-by-step procedure?
Comment 7 m_a_riosv 2023-02-04 22:10:50 UTC
(In reply to m.a.riosv from comment #5)
>...
> The issue in that file is in DOWN.A1 a formula with MATCH in it.
Comment 8 ady 2023-02-05 00:38:41 UTC
(In reply to m.a.riosv from comment #7)
> (In reply to m.a.riosv from comment #5)
> >...
> > The issue in that file is in DOWN.A1 a formula with MATCH in it.

I indeed saw that. 

Let's quote Yurij:
"When changing any parameter in the "LIST"..."


But there is no mention about what exactly was changed in order to replicate the reported problem. So, I repeat here the question:

Which exact cells were modified in order to get that unexpected result? Which modification was made in those specific cells?

For example, in attachment 185057 [details] posted in bug 153298, in the worksheet named "DOWN", cell A1 contains the formula:

$DOWN.A1: =MATCH(1;$LIST.$AA$2:$LIST.$AA$400;1)-1

or:

$DOWN.A1: =MATCH(1,$LIST.$AA$2:$LIST.$AA$400,1)-1

So, in the range $LIST.$AA$2:$LIST.$AA$400 in the "LIST" worksheet, what was that Yurij changed in order to get the problem in the "DOWN" worksheet? Which specific cell in "LIST"? What was either deleted or modified in "LIST"?

Similarly, in your attachment 185075 [details] posted in comment 4, you used MATCH with whole columns, multiple times referring to the same column. That is, while MATCH is using the same column and with "1" as third argument. This means that you might understand what you yourself posted, but other users cannot replicate the original problem because there is that small detail I asked before.

If either Yurij or m.a.riosv could reply with that little piece of info, I'll gladly try to replicate the problem (too).
Comment 9 m_a_riosv 2023-02-05 00:45:54 UTC
Don't need to change anything, the issue is with MATCH() in DOWN.A1 returning a different value in 7.4 (399) than in 7.3 (1)
Comment 10 ady 2023-02-05 01:04:54 UTC
(In reply to m.a.riosv from comment #9)
> Don't need to change anything, the issue is with MATCH() in DOWN.A1
> returning a different value in 7.4 (399) than in 7.3 (1)

Great. I just was trying to follow what Yurij posted (which I quoted above).

Anyway. There is a recent bug 152774 that received commits also recently, related to MATCH(), but I don't think it affected version 7.4.x (yet?). In spite of that, perhaps the 2 devs that committed patches there might be able to look at this too, also related to MATCH().
Comment 11 ady 2023-02-05 03:05:23 UTC
FWIW, apparently the problem with the MATCH(1;AA2:AA400;1) function in 7.4.x+ (ATM) seems to be in how it reacts when the list is not really completely sorted (e.g. ascending in this case).

1. If the list of values is {0;1;0;0...} (i.e. not ascending in its entirety), the result of the above MATCH seems to be wrong (not "2"). The search/lookup goes down to the end of the list (they are all zeroes/empty after the first "1").
2. If the list is entirely ascending {0;1;2;3...} the result is correct ("2").
3. When initially loading attachment 185057 [details], the MATCH function in $DOWN.A1 is not recalculated, so it (still) results in "1" (=2-1).
4. On (hard) recalculation, the cell $DOWN.A1 is recalculated to 398 (=399-1).
5. Manually replacing $DOWN.A1 with a "1" brings the rest of the depending cells' values to their original (non-empty) values.

For some reason, MATCH() fails when the lookup array is not actually sorted according to what the third argument of MATCH() would expect. In theory, in this case MATCH() should still find the (first) "1" located in $LIST.AA3 anyway, but it doesn't, just because the list is not entirely sorted.
Comment 12 raal 2023-02-05 08:50:39 UTC
This seems to have begun at the below commit.
Adding Cc: to Luboš Luňák ; Could you possibly take a look at this one?
Thanks
 ad819761e169e4d54c1ac4e720fb01aea39add49 is the first bad commit
commit ad819761e169e4d54c1ac4e720fb01aea39add49
Author: Jenkins Build User <tdf@pollux.tdf>
Date:   Tue May 10 16:34:06 2022 +0200

    source 6a5464b800aa0b0ce35d602fd008b555d96a94af

https://git.libreoffice.org/core/+/6a5464b800aa0b0ce35d602fd008b555d96a94af
Comment 13 ady 2023-02-05 09:00:01 UTC
Created attachment 185129 [details]
bug tdf 153297 demo MATCH problem in LO74x

Attaching simple sample file (18KB) demonstrating the problem when the third argument for MATCH() is "1", for LO version 7.4.x.

There are problems with both "sorted" possible 3rd argument (1 and -1). This file shows only the ascending case(s), for illustration purposes.


* Different results for MATCH(), depending on how long the lookup array is.
* Different results for MATCH(), depending on how many empty cells are part of the lookup array, but also depending on their position in the lookup array list (not just depending on amount).
* The values that differ in each lookup array are all located after the first lookup value.
* In each lookup array, all values until the first lookup value (1) are exactly the same ({0;1}); yet, MATCH() results are different (with LO 7.4.x).
Comment 14 Winfried Donkers 2023-02-05 15:02:01 UTC
Using MATCH with 3rd argument 1, -1 or leaving 3rd argument empty (which then defaults to 1) requires the range that is searched to be sorted. If the range is not sorted , the result is not defined and can be different for different applications or even versions. Also, when the ordering is ascending, the last match is returned and when the ordering is  descending, the first match is returned. See "https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part4-formula/OpenDocument-v1.3-os-part4-formula.html#__RefHeading__1018430_715980110".

In the sample files that have been attached, none of the ranges looks sorted to me. Then all results for MATCH with 3rd argument 1, -1 or empty have no real meaning.

AFAICS this is not a bug.

FWIW, I am currently working on a change in the code that does the searching for MATCH, LOOKUP, HLOOKUP and VLOOKUP to handle searching for XLOOKUP as well. These changes could produce different results again.
Comment 15 ady 2023-02-05 22:00:11 UTC
(In reply to Winfried Donkers from comment #14)
> In the sample files that have been attached, none of the ranges looks sorted

Indeed, that was specifically mentioned.


> to me. Then all results for MATCH with 3rd argument 1, -1 or empty have no
> real meaning.

That is not completely accurate. It is true that the lookup array is expected to be sorted, and when it isn't, the results are not "predictable". That means that the result cannot be expressed beforehand in some generic simple words, but rather that it depends on a case-by-case basis. But, that does not mean that the results will be different according to whatever goes in the list after the first matched value. IOW, the result shall still be repeatable. That is why I added an attachment, to demonstrate exactly this non-repeatable behavior. Let me emphasize the point...

If I am using "1" (ascending) search, and the lookup array starts with {0;1;whatever}, then when the lookup value is "1" (without quotation marks), the result shall be "2" (without quotation marks), always. The result cannot vary depending on the _next_ values. MATCH() should find the first value that accomplishes the conditions. Attachment 185129 [details] demonstrates that this is not currently the case (the result varies, depending on values that are _after_ the first position that accomplishes the conditions), and if you check with LO 7.3.x (and older), you should obtain "2" as result.

So, yes, there is no generic way to know the result beforehand as a generic rule, but the results shall still follow the conditions/rules; they are not random, or vary according to the number of empty cells _after_ the first "OK" value.

Please review attachment 185129 [details] again, in order to spot these nonsensical varying results.
Comment 16 Eike Rathke 2023-02-06 12:44:14 UTC
Sorry, but this is plain wrong.

(In reply to ady from comment #15)
> But, that
> does not mean that the results will be different according to whatever goes
> in the list after the first matched value.
Yes it does. Because the search does a binary search picking elements to compare with the query and decide into which direction the picking should continue. If length and/or content of the search vector vary the elements picked differ and thus does the result which is _arbitrary_ depending on unsorted content. The algorithm does not iterate from the beginning to stop at a found element. The search vector MUST be strictly sorted for the binary search to yield a meaningful result.

> IOW, the result shall still be repeatable.
No, wrong expectation. 

See https://en.wikipedia.org/wiki/Binary_search_algorithm to get an idea how that works in general.