81788 – LOOKUP returns often a mess due to definition

Bug 81788 - LOOKUP returns often a mess due to definition

Summary: LOOKUP returns often a mess due to definition

Status:	NEW

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Calc (show other bugs)
Version: (earliest affected)	Inherited From OOo
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	Calc-Function
	Show dependency tree / graph

Reported:	2014-07-26 17:20 UTC by vertelhetsinterklaasmaar
Modified:	2018-03-05 19:22 UTC (History)
CC List:	0 users

See Also:
Crash report or crash signature:

Attachments
examples of ascending and non-ascending datasets and there effects (23.21 KB, application/vnd.oasis.opendocument.spreadsheet) 2014-07-26 17:20 UTC, vertelhetsinterklaasmaar	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description vertelhetsinterklaasmaar 2014-07-26 17:20:03 UTC

Created attachment 103510 [details]
examples of ascending and non-ascending datasets and there effects

The definition LOOKUP (and INDEX) does almost as the words "look up" implies to do. There is a negative side. If the person is not checking the definition of the formula closely the output can be rubbish it some points. There will be no error shown that the dataset is "bad". The source of the problem is in my opinion the definition of the formula. This has in my opinion one really bad part:

[definition:]
Returns the contents of a cell either from a one-row or one-column range. Optionally, the assigned value (of the same index) is returned in a different column and row. As opposed to VLOOKUP and HLOOKUP, search and result vector may be at different positions; they do not have to be adjacent. Additionally, the search vector for the LOOKUP must be sorted ascending, otherwise the search will not return any usable results.
[/definition]

My problem is with the part: "Additionally, the search vector for the LOOKUP must be sorted ascending, otherwise the search will not return any usable results." When a dataset needs to be ascending (in my opinion that is a rare kind of dataset) and there is no check that the dataset is ascending it returns at "random" moments wrong outputs. 

That a formula or dataset is incorrect is not a problem but than it should give an error. Now the results look like it is correct but the results are wrong! 

An "ascending" dataset is often not what a user uses. After "data1" he wants to put "data2" and not an ascending data witch is normally (for users) not used to order numbers: 
data1
data10 
data100 
data1000 
data101 
data102 
data103
data104
data105
data106
data107
data108
data109
data11
data110 etc.

This looks like a detail but when you try to make a spreadsheet of a long calculation with a lot of variables this appears really often. Example of a place where it went wrong: 
I had a dataset with different kinds of steel beams to calculate the lateral-torsional buckling. All beams have standardised names like: "HEA100" until "HEA1500". When I copy the standardised dataset it gives sometimes answers, sometimes errors and sometimes rubbish. 
For this case it is almost impossible (and annoying) to have an "ascending" dataset. The manual checking that the dataset is correct will become a lot harder when the dataset becomes "ascending". All the sources always use a standardised sheet. Now every time I need to look up through the entire sheet to check the next value.

Comment 1 Algot Runeman 2014-07-27 14:43:59 UTC

This is an enhancement issue according to one of the developers and is probably going to take some time to solve. It involves the way in which the lookup table is evaluated, perhaps as a binary search to keep the lookup quick.

The first stage "fix" will probably be to add a warning if the lookup table cannot be in order, such as in your case.

Thank you for reporting it.

Comment 2 Eike Rathke 2018-03-05 18:45:21 UTC

The LOOKUP function is *defined* to work on ascending sorted data. That is how all major spreadsheet applications implement it. As an implementation may and probably will use some sort of binary search on such a range of data to speed up the lookup, the result *will* be arbitrary if the data is not sorted, depending on algorithm and actual data. That is the nature of applying binary search algorithms to non-sorted data. To determine an error in sorting the function would have to iterate all values in the range and compare with the neighbour data, which would significantly slow down things. Which still wouldn't help if the spreadsheet is loaded in another implementation where another arbitrary value would be returned.

The only viable solution to this is a new ISSORTED(range) function that the user can call as in =IF(ISSORTED(range1);HLOOKUP(x;range1;range2);NA())

I'm tempted to close this as WONTFIX though.