Bug 141989 - Calc VLOOKUP from DATA VALIDATION cell returns N/A for two of the validated values
Summary: Calc VLOOKUP from DATA VALIDATION cell returns N/A for two of the validated v...
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.0.5.2 release
Hardware: All Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 144229 149832 152708 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-04-30 10:42 UTC by Colin
Modified: 2022-12-28 20:56 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Error Example File (13.31 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-04-30 10:43 UTC, Colin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Colin 2021-04-30 10:42:13 UTC
Description:
Sample File [Array Error.ods] attached.
A4 is the validated data list
B5 is a manual choice of column C,D,E or F to be returned
B8 is the return from the A4 & B5 selection criteria
When the array elements [CELL A4] for August or December are selected then the return is N/A 



Steps to Reproduce:
Array C5:F16 is named Sales
Array C5:C16 is named Month
Array D5:D16 is named Prod_1
Array E5:E16 is named Prod_2
Array F5:F16 is named Prod_3
Range C3:F3 is simply the column number for the array lookup
Range C4:F4 is simply the column header and a reminder of the Array name
Select a validated month from A4
Enter a column number in B5 (It will accept 1 which returns the month name at B8
Observe the returned value from table "Sales" in B8 
Try August or December in A4
The Months were initially created by dragfill on January
Please feel free to remove all the colours and conditional format if they are too distracting. They are only intended to highlight the relevant array cells.
Note: Common multiples of 150 have produced duplicate numbers - they were also dragfilled ;))


Actual Results:
August and December return N/A

Expected Results:
August, December and approriate numbers from the "Sales" array


Reproducible: Always


User Profile Reset: No


OpenGL enabled: Yes

Additional Info:
Version: 7.0.5.2 (x64)
Build ID: 64390860c6cd0aca4beafafcfd84613dd9dfb63a
CPU threads: 4; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win
Locale: sv-SE (en_GB); UI: en-GB
Calc: threaded
Comment 1 Colin 2021-04-30 10:43:06 UTC
Created attachment 171523 [details]
Error Example File
Comment 2 Mike Kaganski 2021-04-30 11:16:32 UTC
Formula in B8 is

> =VLOOKUP(A4;Sales;B5)

which uses sorted range lookup [1]. It requires that the search column be sorted ascending, which is obviously not the case. Your data is not sorted, and thus you need 0 as the fourth argument.

This is not a bug.

[1] https://help.libreoffice.org/latest/en-US/text/scalc/01/04060109.html?DbPAR=CALC#bm_id3153152
Comment 3 Colin 2021-04-30 13:46:07 UTC
(In reply to Mike Kaganski from comment #2)
> Formula in B8 is
> 
> > =VLOOKUP(A4;Sales;B5)
> 
> which uses sorted range lookup [1]. It requires that the search column be
> sorted ascending, which is obviously not the case. Your data is not sorted,
> and thus you need 0 as the fourth argument.
> 
> This is not a bug.
> 
> [1]
> https://help.libreoffice.org/latest/en-US/text/scalc/01/04060109.
> html?DbPAR=CALC#bm_id3153152

Thanks for clarifying.
Is there a reason why it works as expected for 10 of the twelve events?
Shouldn't it just fail miserably and everything return N/A?
Comment 4 Mike Kaganski 2021-04-30 14:11:25 UTC
(In reply to Colin from comment #3)
> Is there a reason why it works as expected for 10 of the twelve events?
> Shouldn't it just fail miserably and everything return N/A?

It should not "fail miserably". It doesn't check if the data id sorted (doing which would require checking all elements, which would defeat the idea of fast search used in this mode); it just assumes the sorting, and uses binary search (checking value in the middle; then if wanted value is "less than", it searches first half, otherwise second half - splitting the range to two halves at each iteration, and only doing small number of comparisons). At each check it will naturally find that wanted value is smaller or greater than next checked value, and finally would arrive to some position - and it may accidentally happen to be correct, but that would be just a coincidence (as in first successful results). It would arrive to N/A only if at *each* attempt the wanted value happened to be smaller than checked position (so lookup arrives to position "before start").
Comment 5 Mike Kaganski 2021-04-30 14:16:10 UTC
> Is there a reason why it works as expected for 10 of the twelve events?

By the way, it didn't. It showed February's result when you chose July; August's result for October and November; so it in fact failed 5 times out of 12. And the wrong results are actually even more dangerous if you use VLOOKUP incorrectly, since you may (and did!) overlook that.
Comment 6 Colin 2021-04-30 14:19:21 UTC
(In reply to Mike Kaganski from comment #5)
> > Is there a reason why it works as expected for 10 of the twelve events?
> 
> By the way, it didn't. It showed February's result when you chose July;
> August's result for October and November; so it in fact failed 5 times out
> of 12. And the wrong results are actually even more dangerous if you use
> VLOOKUP incorrectly, since you may (and did!) overlook that.

Wow, thanks for your efforts, Mike.
It's been a good day for me - I learned two things.
Sorry I wasted your time
Comment 7 Colin 2021-04-30 14:28:33 UTC
On reflection, that seems slightly sinister.
How could it "find" and return the "wrong" location? It couldn't have matched the string at that location so I would have expected more N/As
Comment 8 Mike Kaganski 2021-04-30 14:45:29 UTC
(In reply to Colin from comment #7)
> How could it "find" and return the "wrong" location? It couldn't have
> matched the string at that location so I would have expected more N/As

Please read the description of the mode:

> SortedRangeLookup is an optional parameter that indicates whether the first
> column in the array contains range boundaries instead of plain values. In
> this mode, the lookup returns the value in the row with first column having
> value equal to or less than SearchCriterion.

Note that "to or less"! When you have a column of

1
2
3
4
5

doing sorted lookup for 4.75 would give you 4, not N/A. That is very useful, e.g., for date ranges like "who was the president of USA in 1956?", when Eisenhower has 1953-01-20 as his start date, and Kennedy has 1961-01-20.

But in case of unsorted range, VLOOKUP still assumes that if it didn't find perfect match, then the earliest value of the two last checks is the one it was looking for (it consider it "smaller"). Hence the wrong, but still found, results.
Comment 9 Colin 2021-04-30 15:02:57 UTC
(In reply to Mike Kaganski from comment #8)
> (In reply to Colin from comment #7)
> > How could it "find" and return the "wrong" location? It couldn't have
> > matched the string at that location so I would have expected more N/As
> 
> Please read the description of the mode:
> if it didn't find
> perfect match, then the earliest value of the two last checks is the one it
> was looking for (it consider it "smaller"). Hence the wrong, but still
> found, results.

That's much clearer. Interestingly, I have used the "gap" between two dates in precisely the manner you demonstrated but as the array was always intended to return the next lower date I hadn't appreciated the implication when I wasn't looking for the "gap". I must be getting old and forgetful.
Thanks again for your patience.
Comment 10 Mike Kaganski 2022-12-28 20:55:51 UTC
*** Bug 144229 has been marked as a duplicate of this bug. ***
Comment 11 Mike Kaganski 2022-12-28 20:55:56 UTC
*** Bug 149832 has been marked as a duplicate of this bug. ***
Comment 12 Mike Kaganski 2022-12-28 20:56:01 UTC
*** Bug 152708 has been marked as a duplicate of this bug. ***