Bug 121090 - Change name (description) of 4th argument to VLOOKUP and HLOOKUP to "Sorted range lookup" in help and in function wizard (see comment 7, comment 9)
Summary: Change name (description) of 4th argument to VLOOKUP and HLOOKUP to "Sorted r...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Calc-Function Help-Changes-Features
  Show dependency treegraph
 
Reported: 2018-11-01 06:46 UTC by gmolleda
Modified: 2018-11-09 19:26 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Use vlookup according ordered data or if you need approximate result. (12.68 KB, application/octet-stream)
2018-11-01 09:33 UTC, gmolleda
Details
Screenshot with the help inline of the 4th parameter. (146.47 KB, image/png)
2018-11-02 10:04 UTC, gmolleda
Details

Note You need to log in before you can comment on or make changes to this bug.
Description gmolleda 2018-11-01 06:46:17 UTC
In help of functions vlookup and hlookup there is an error: the fourth parameter must be "Approximate" and not "Order sort" nor "Sorted".
The error is because the Excel have the same error.

En la helpo de la funkcioj vlookup kaj hlookup estas eraro: la kvara argumento devas esti "Approximate" kaj ne "Order sort" nek "Sorted".
La eraro estas ĉar la Excel havas la saman eraron.

En la ayuda de las funciones buscarv y buscarh hay un error: el cuarto parámetro debe ser "Aproximación" o "Por aproximación" y no "Ordenado".
El error es debido a que Excel tiene el mismo error y lo han trasladado de allí.

La angla diskriminacias homojn pro sia nacieco kaj pro siaj enspezoj, bonvolu uzi Esperanton kiel internacian lingvon.
English discriminates people by their nationality and by their income, try to use Esperanto as an international language.
El inglés discrimina a las personas por su nacionalidad y por su renta, intenten usar esperanto como lengua internacional.
Comment 1 Mike Kaganski 2018-11-01 07:21:40 UTC
Thanks for filing the report!

(In reply to gmolleda from comment #0)
> In help of functions vlookup and hlookup there is an error: the fourth
> parameter must be "Approximate" and not "Order sort" nor "Sorted".

Please describe why do you think it's so? The fourth parameter actually is a boolean, which positive (true) value really tells the functions that the Array (2nd argument) is sorted. This makes the functions use an algorithm based on that assumption, thus not giving a correct answer when the Array isn't really sorted. The "approximate" seems to be orthogonal to that.

Setting to NEEDINFO. Please reset back to NEW when provided the requested clarifications. Thanks!
Comment 2 gmolleda 2018-11-01 07:34:04 UTC
Because the fourth parameter is:
0: the search is exact. The data could be ordered or not, but if the data is ordered the function work ok.
1: the search is approximate. The data must be ordered, ok.

In both the data could be ordered and both will work well. The order sort is not the reason for choice vlookup or hlookup, the reason is if you need exact or approximate search.
Comment 3 gmolleda 2018-11-01 07:36:35 UTC
The fourth parameter tells the function whether the search should be approximate or not, not whether the data is sorted or not.
Comment 4 Mike Kaganski 2018-11-01 07:50:42 UTC
(In reply to gmolleda from comment #3)
> The fourth parameter tells the function whether the search should be
> approximate or not, not whether the data is sorted or not.

This is incorrect statement. I have told you what the parameter does exactly; that is based on the OpenDocument v1.2 standard [1]; it reads:

> Semantics:
> 
> If RangeLookup is omitted or TRUE or not 0, the first column of DataSource is
> assumed to be sorted in ascending order, with smaller Numbers before larger
> ones, smaller Text values before larger ones (e.g., "A" before "B", and "B"
> before "BA"), and False before True. If the types are mixed, Numbers are sorted
> before Text, and Text before Logicals; evaluators without a separate Logical
> type may include a Logical as a Number. The lookup will try to match an entry of
> value Lookup. From a sequence of identical values <= Lookup the last entry is
> taken. If none is found the largest entry less than Lookup is taken as a match.
> If there is no data less than or equal to Lookup, the #N/A Error is returned. If
> Lookup is of type Text and the value found is of type Number, the #N/A Error is
> returned. If DataSource is not sorted, the result is undetermined and
> implementation-dependent. In most cases it will be arbitrary and just plain
> wrong due to binary search algorithms.
> 
> If RangeLookup is FALSE or 0, DataSource does not need to be sorted and an exact
> match is searched. Each value in the first column of DataSource is examined in
> order (starting at the top) until its value matches Lookup. If no value matches,
> the #N/A Error is returned.
> 
> Both methods, if there is a match, return the corresponding value in column
> Column, relative to the DataSource, where the leftmost column in DataSource is 1

So, it clearly tells that the semantics of the fourth parameter is whether to use a "range lookup" algorithm, based on "sorted" state of array (DataSource).

[1] http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#__RefHeading__1018436_715980110
Comment 5 gmolleda 2018-11-01 09:33:26 UTC
Created attachment 146215 [details]
Use vlookup according ordered data or if you need approximate result.

The fourth parameter, what is it for?
Do you inform the function about whether the list is ordered? 
or 
Ask the function to do an approximate search or not?

I have uploaded a file with example.

Could be that I do not understand the English, because I am spanish and so the Esperanto language is better for international communications.
Comment 6 Mike Kaganski 2018-11-01 11:52:35 UTC
(In reply to gmolleda from comment #5)
> The fourth parameter, what is it for?
> Do you inform the function about whether the list is ordered? 
> or 
> Ask the function to do an approximate search or not?

The reason for choosing the value for 4th parameter may vary; and that you chose one of possible reasons is just a use case. Another valid use case is speed.

Furthermore, the wording in LO is chosen not to follow the standard wording (which is more strictly correct, being "RangeLookup"), for purpose: the option *requires* that the argument 2 is sorted, (something that is not apparent from "range lookup" name) - and failing that requirement *is the most common error among users*, even with the "sorted" put into the argument name in help, and in function wizard (as we have now)! Naming it otherwise would do more harm to users (especially not power users) than "good" to purists who understand their reasons to use the option in any case.

> Could be that I do not understand the English, because I am spanish and so
> the Esperanto language is better for international communications.

I am Russian, so using Esperanto would not work for me. And please refrain from irrelevant, argumentative, and politicized topics in the bug tracker.
Comment 7 Mike Kaganski 2018-11-01 12:11:06 UTC
Furthermore, the "approximate" is totally wrong interpretation of the purpose. The "Range lookup" is the correct one: the idea is that you break the total range of possible values into some "ranges" (each of the ranges might include more than a single possible values); the ranges are some list of *sorted* values; and the VLOOKUP-like function searches through the range list to find the appropriate range.

Given your sample file, we could come with the following valid case: imagine a "dictionary" with alphabetically sorted words, and each page starts with some word. Say, page 1 starts with "abacus", page 2 with "curve", page 3 with "jedi", page 4 with "lost" etc. So we have this:

>     A     B
> 1 abacus  1
> 2 curve   2
> 3 jedi    3
> 4 lost    4
> ...

So we need to find on which page should we look for a definition of an arbitrary word we enter. We put this into D1: "=VLOOKUP(C1; A1:B100; 2; 1)". Say, we enter "lose" into C1. In no way can the proper found value 2 represents something "approximate": "jedi" is *much* farther from "lose" than "lost". But it is correct that the word "lose" falls into a range starting from "jedi" and going until the next range starts.
Comment 8 Mike Kaganski 2018-11-01 12:12:19 UTC
(In reply to Mike Kaganski from comment #7)
> In no way can the proper found value 2

typo; of course, "the proper found value 3" was meant
Comment 9 gmolleda 2018-11-01 12:38:44 UTC
Yes, the term "Range lookup" is the most correct. If the reason for using "Ordered" is for the user to order, a better solution would be "Range ordered lookup" as a fourth argument.

Sí, el término "Búsqueda por intervalo" es el más correcto. Si la razón de usar "Ordenado" es para que el usuario lo dé ordenado, una mejor solución sería "Búsqueda por intervalo ordenado" como cuarto argumento.

Jes, la termino "Ampleksa serĉo" estas la plej ĝusta. Se la kialo por uzado "Ordigita" estas por la uzanto ordigi, pli bona solvo estus "Ampleksa ordigita serĉo" kiel kvara argumento.
Comment 10 Mike Kaganski 2018-11-01 13:09:40 UTC
So let's ask for UX eval on the "Sorted range lookup" then.
Comment 11 Heiko Tietze 2018-11-01 14:50:08 UTC
If I get it right the request is to rename "Order sort" to "Range lookup" or "Sorted range lookup" or "Ordered range lookup". 

I don't think any of those wording helps the user to understand what the parameter means. The "lookup" term is clear from the function itself and range, order vs. sort... not a big difference in particular when translation is applied. But OTOH it also does not hurt to change the terms (except l10n). 

The only way to understand complex functions like this with all the parameter is to provide a good help. Added Olivier for that.
Comment 12 gmolleda 2018-11-01 21:33:12 UTC
"and range, order vs. sort... not a big difference"
If the user reads "Ordered" he may think that if the list is ordered he has to put True. But no, the list could be ordered but the type of search needed could be exact using False as the fourth parameter.

Si el usuario lee "Ordenado" puede pensar que si la lista es ordenada tiene que poner Verdadero. Pero no, la lista podría estar ordenada pero el tipo de búsqueda necesitada podría ser exacta usando Falso como cuarto parámetro.

Se la uzanto legas "Ordonita" li eble pensas, ke se la listo estas ordigita, li devas meti Vera. Sed ne, la listo povus esti ordigita sed la tipo de serĉo bezonata povus esti ĝusta uzante Falsa en la kvara parametro.
Comment 13 Olivier Hallot 2018-11-01 23:37:01 UTC
The help content for VLOOKUP is quite precise.

en-US
https://help.libreoffice.org/6.2/en-US/text/scalc/01/04060109.html?DbPAR=CALC#bm_id3153152

es
https://help.libreoffice.org/6.2/es/text/scalc/01/04060109.html?DbPAR=CALC#bm_id3153152

eo
https://help.libreoffice.org/6.2/eo/text/scalc/01/04060109.html?&DbPAR=CALC&System=UNIX#Section9

I don't see any need for changing the name of the parameters. If absolutely needed, I'll choose "SortedRangeLookup", + adjust the help pages and the UI dialog.
Comment 14 Mike Kaganski 2018-11-02 05:22:39 UTC
Well, if needed, the first sentence of the 4th param

> Sorted is an optional parameter that indicates whether the first column in
> the array is sorted in ascending order.

could be changed to something like:

> Sorted(RangeLookup) is an optional parameter that indicates whether the first column in the array represents a range list sorted in ascending order.

This actually addresses the valid concern from comment 12.
Comment 15 Jean-Francois Nifenecker 2018-11-02 06:04:18 UTC
(In reply to Mike Kaganski from comment #14)
> Well, if needed, the first sentence of the 4th param
> 
> > Sorted is an optional parameter that indicates whether the first column in
> > the array is sorted in ascending order.
> 
> could be changed to something like:
> 
> > Sorted(RangeLookup) is an optional parameter that indicates whether the first column in the array represents a range list sorted in ascending order.
> 
> This actually addresses the valid concern from comment 12.

In fact, IMO, the point is not exactly the description of this argument but the description of its consequences on the function result when VLOOKUP() can't match an exact value.

If set to 1/True

No exact match -> the result is an approximative match. The user has no clue about that and could think the result is the actual match (most do!).

If set to 0/False

No exact match -> the result is a #N/A error. Now, the user actually *knows* the looked up value was not retrieved and can take appropriate action.

Not sure if theses consequences are always made clear. During trainings I always have to insist on this point because of the consequences of not selecting the right option.

My 2 euro-cents.
Comment 16 Mike Kaganski 2018-11-02 06:14:25 UTC
(In reply to Jean-Francois Nifenecker from comment #15)

... which is why I keep using that "range" word, and want to stress that 1 means "look for range", not "look for value". And of course, that might be improved over my suggestion in comment 14.
Comment 17 Mike Kaganski 2018-11-02 06:24:03 UTC
(In reply to Jean-Francois Nifenecker from comment #15)
> If set to 1/True
> 
> No exact match -> the result is an approximative match. The user has no clue
> about that and could think the result is the actual match (most do!).

The "approximative match" is misinterpretation: please see comment 7.
Comment 18 Heiko Tietze 2018-11-02 07:35:25 UTC
(In reply to Olivier Hallot from comment #13)
> The help content for VLOOKUP is quite precise.

How about an example, something like in comment 7.
Comment 19 gmolleda 2018-11-02 10:04:39 UTC
Created attachment 146252 [details]
Screenshot with the help inline of the 4th parameter.

When you write in the cell the function "vlookup", a littel inline help appears with the follow text: "Search criterion; Array; Index; >> Sort order): If the value is TRUE or not given, the search column of the array must be sorted inascending order."

You can view that this help says nothing about if the 4th parameter is for search in intervals or exact value.