Bug 160893 - XLOOKUP problem with date as text
Summary: XLOOKUP problem with date as text
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
24.8.0.0 alpha0+
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Calc-Function
  Show dependency treegraph
 
Reported: 2024-05-01 17:19 UTC by Regina Henschel
Modified: 2024-07-13 05:55 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Lookup_array with dates (24.08 KB, application/vnd.oasis.opendocument.calc)
2024-05-01 17:19 UTC, Regina Henschel
Details
Sample file with XLOOKUP and MATCH (25.23 KB, application/vnd.oasis.opendocument.spreadsheet)
2024-05-01 23:32 UTC, m_a_riosv
Details
The original in ods and its xlsx version in Excel 2021 (104.65 KB, image/png)
2024-05-21 08:27 UTC, Gabor Kelemen (allotropia)
Details
The second example in ods and its xlsx version in Excel 2021 (97.79 KB, image/png)
2024-05-21 08:34 UTC, Gabor Kelemen (allotropia)
Details
Examine text as number in MATCH, VLOOKUP, LOOKUP, XMATCH and XLOOKUP (19.93 KB, application/vnd.oasis.opendocument.spreadsheet)
2024-06-13 18:40 UTC, Regina Henschel
Details
XLOOKUP created in Excel 365 (34.88 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2024-06-18 19:36 UTC, Regina Henschel
Details
Calc XLOOKUP test xlsx (20.75 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2024-07-13 05:55 UTC, ady
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Regina Henschel 2024-05-01 17:19:30 UTC
Created attachment 193922 [details]
Lookup_array with dates

Open attached file. Save it as .xlsx and open the saved file in Excel. Compare.

Problem:
(1) If the Lookup_value is a text that looks like a date, it is interpreted as number.
(2) If the Lookup_array contains dates, neither a lookup_value in serial date form nor as result of the DATE function matches the dates.
Comment 1 m_a_riosv 2024-05-01 23:32:55 UTC
Created attachment 193924 [details]
Sample file with XLOOKUP and MATCH

It seems to have the same differences as MATCH().

Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: f4ef5435df5560e6b6b061ce4053c71e2819bf51
CPU threads: 16; OS: Windows 10.0 Build 22631; UI render: Skia/Vulkan; VCL: win
Locale: es-ES (es_ES); UI: en-US
Calc: CL threaded

Excel		
$71,29	$86,92	=XLOOKUP("2024-01-15";A2:A4;B2:B4;"no invoice";0;2)
33,12	no invoice	=XLOOKUP(DATE(2024;1;15);A2:A4;B2:B4;"no invoice";0;2)
$71,29	$86,92	=XLOOKUP(T("2024-01-15");A2:A4;B2:B4;"no invoice";0;2)
33,12	no invoice	=XLOOKUP(45306;A2:A4;B2:B4;"no invoice";0;2)
		
$3,00	$2,00	=MATCH("2024-01-15";A2:A4;0)
1	#N/A	=MATCH(DATE(2024;1;15);A2:A4;0)
$3,00	$2,00	=MATCH(T("2024-01-15");A2:A4;0)
1	#N/A	=MATCH(45306;A2:A4;0)
Comment 2 ady 2024-05-02 08:40:54 UTC
(In reply to Regina Henschel from comment #0)

> Problem:
> (1) If the Lookup_value is a text that looks like a date, it is interpreted
> as number.
> (2) If the Lookup_array contains dates, neither a lookup_value in serial
> date form nor as result of the DATE function matches the dates.

While I don't have access to Excel 2021 to test its results ATM, I think that the description in comment 0 given to the results from Calc is slightly inaccurate.

I think the problem (assuming that Excel 2021 provides the results as in attachment 193922 [details] cell range B7:B10, which I cannot review by myself ATM) is that Calc is comparing/matching according to the ** display format **, rather than according to cell value.

STR:
1. Open attachment 193922 [details].
2. Note the resulting value of cell C7: 33.12.
3. On cell A2, change (display) format from original YYYY-MM-DD to YY-MM-DD.
4. Recalculate Hard.
5. Note that the result on cell C7 changed from 33.12 to 86.92.

Explanation (my guess):

In cell C7:
=XLOOKUP("2024-01-15";A2:A4;B2:B4;"no invoice";0;2)

is looking for "2024-01-15" and originally found the first match in cell A2. Once cell A2 changed its (display) format, XLOOKUP no longer finds "2024-01-15" in cell A2, despite cell A2 containing the same internal value (serial date/time "45306"). The new first match is located in cell A3, still displaying the same value in the same display format that matches "2024-01-15".

Conclusion (my guess):
Calc's XLOOKUP is matching the values according to display format, not by internal value.

Assuming the values in cell range B7:B10 are to be taken as correct – I just don't have any way to confirm it ATM by myself – then Excel 2021 would seem to perform the comparison/match according to internal value, not by "display format". Please don't just trust this and test by yourself, because I might very well be wrong and I cannot test this by myself ATM.


(In reply to m_a_riosv from comment #1)
> Created attachment 193924 [details]
> Sample file with XLOOKUP and MATCH
> $3,00	$2,00	=MATCH("2024-01-15";A2:A4;0)
> 1	#N/A	=MATCH(DATE(2024;1;15);A2:A4;0)
> $3,00	$2,00	=MATCH(T("2024-01-15");A2:A4;0)
> 1	#N/A	=MATCH(45306;A2:A4;0)

I get a different result than that. For instance:
=MATCH("2024-01-15";A2:A4;0)
results in "1" (not "2") in Calc (after Recalculate Hard).

Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: bc7501d30ab9a3a5edec66d2f0022177f85d6127
CPU threads: 4; OS: Windows 10 (10.0 build 19045); UI render: Skia/Raster; VCL: win
Locale: en-US (es_AR); UI: en-US
Calc: CL threaded

At any rate, we should be careful when comparing XLOOKUP/XMATCH with the older VLOOKUP/MATCH functions, considering the explicit argument for "binary search" available in the newer "X" functions. We also don't always know whether the "sorting" used in Calc's functions (for the binary search) is _exactly_ the same sorting order used in Excel.
Comment 3 ady 2024-05-19 13:59:54 UTC
@Balázs

Polite ping.

XLOOKUP: Internal cell value vs display value (comment 2).

I cannot test with Excel by myself ATM.

@All,

This report is still set as UNCONFIRMED. Is this a bug, or not? We should rather know it (and be solved if it is) before LO 24.8 beta arrives.

I guess the same question – internal cell value vs display value – would be relevant for XMATCH()?
Comment 4 Gabor Kelemen (allotropia) 2024-05-21 08:27:20 UTC
Created attachment 194238 [details]
The original in ods and its xlsx version in Excel 2021

Indeed there is a difference in results, with current nightly:

Version: 24.8.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: ae798781ef4df7a1fdef13af0bc459bf4f6e7b4c
CPU threads: 6; OS: Windows 10 X86_64 (10.0 build 19045); UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded
Comment 5 Gabor Kelemen (allotropia) 2024-05-21 08:34:12 UTC
Created attachment 194239 [details]
The second example in ods and its xlsx version in Excel 2021

Looks like MATCH behaves the same way - maybe this is a separate bug?
Comment 6 ady 2024-05-21 09:05:51 UTC
(In reply to Gabor Kelemen (allotropia) from comment #5)

> Looks like MATCH behaves the same way - maybe this is a separate bug?

IMO, the MATCH() case should rather be a separate report (to be reviewed for regressions and other possible issues too, for instance).

Is it OK to modify the current Summary field, from the current:
"XLOOKUP problem with date as text"
to something more generic, such as:
"XLOOKUP searches/matches cell's display format/value instead of cell's internal value"
or something similar to that?

The problem is not really (just) a date as text.
Comment 7 Balázs Varga (allotropia) 2024-05-21 09:26:34 UTC
(In reply to ady from comment #6)

I think no need to create new report for MATCH, since it is LOOKUP specific issue which is affect on other LOOKUP functions like VLOOKUP/HLOOKUP/MATCH etc?
In that case the Summary field, can be just:
"LOOKUP functions (XLOOKUP/VLOOKUP/HLOOKUP/MATCH/XMATCH) searches/matches cell's display format/value instead of cell's internal value", (If its related to all the LOOKUP functions).
Comment 8 ady 2024-05-21 10:38:36 UTC
@Balázs

Considering that the problem affects XLOOKUP() (at least), are you planning on assigning this to yourself?

If I may suggest, let's see what happens when solving this for XLOOKUP(), and then we can re-check the others (whether the patch for XLOOKUO() solves any other functions).
Comment 9 m_a_riosv 2024-05-21 16:16:31 UTC
Maybe some relation with
tdf#144740 Criteria query (COUNTIF(), MATCH() et al) for numeric values broken considering number formats
tdf#144253 Filtering does not work with formatted calculated values
Comment 10 Thorsten Behrens (allotropia) 2024-05-27 08:59:35 UTC
(In reply to ady from comment #8)
> Considering that the problem affects XLOOKUP() (at least), are you planning
> on assigning this to yourself?
> 
This is a general issue, and only coincidentally related to the new XLOOKUP implementation (it might be more obvious there, though). It's unlikely we get to this in the near future.
Comment 11 Regina Henschel 2024-06-13 18:40:58 UTC
Created attachment 194707 [details]
Examine text as number in MATCH, VLOOKUP, LOOKUP, XMATCH and XLOOKUP

Text, which could be interpreted as number is affected too.

The error occurs in
MATCH with Match_mode 0, i.e. exact search, sorting does not matter here
XMATCH with Match_mode 0 and Search_mode 1, i.e exact search on maybe unsorted data.
VLOOKUP on unsorted data
XLOOKUP with Match_mode 0, sorting does not matter here

The error does not occur in
MATCH with Match_mode 1, i.e. approximate search on sorted data.
XMATCH with Match:mode 1, i.e. approximate search, sorting does not matter here.
VLOOKUP on sorted data
XLOOKUP with Match_mode 1, sorting does not matter here.
LOOKUP

The error is only if the Lookup_value is a text and the values in the Lookup_array are numbers. The other way round, Lookup_value is number and the values in the Lookup_array are text, is always correct.
Comment 12 Eike Rathke 2024-06-18 10:51:57 UTC
MATCH and [HVX]LOOKUP never consider cell format and should not. What happens however is that if the search criterion can be interpreted as a number (which a date string is also) a numeric lookup is performed.

IIRC that was implemented because Excel did it that way. That seems to have changed? Or Excel first tries to find the string match and only if there is none it retries with a conversion to number if possible? 

Anyhow, our XLOOKUP implementation is at least wrong for any case returning "no invoice" in attachment 193924 [details], i.e. C8 and C10. Note that you may have to Ctrl+F9 hard recalculate to get the current implementation's results.
Comment 13 ady 2024-06-18 16:19:57 UTC
(In reply to Eike Rathke from comment #12)
> MATCH and [HVX]LOOKUP never consider cell format and should not. What
> happens however is that if the search criterion can be interpreted as a
> number (which a date string is also) a numeric lookup is performed.

Comment 2 shows not a change in "search criterion", but a change in display format of (part of) the data. I admit I am now confused by what the expected behavior should be.

Also, if a string that resembles a date (between quotation marks) is considered as a number in "search criterion" (as Eike mentions in comment 12), is there any way to search for such "date" as a string (and not as a number)?


> 
> IIRC that was implemented because Excel did it that way. That seems to have
> changed? Or Excel first tries to find the string match and only if there is
> none it retries with a conversion to number if possible? 


I don't have access to Excel 2021 in order to create a sample case anew (from scratch) with Excel itself only (and without relying on Calc at all). @Regina? @Miguel?

> 
> Anyhow, our XLOOKUP implementation is at least wrong for any case returning
> "no invoice" in attachment 193924 [details], i.e. C8 and C10. Note that you
> may have to Ctrl+F9 hard recalculate to get the current implementation's
> results.

So, is this related to all lookup functions (including MATCH)? Or only to XLOOKUP/XMATCH)?

Do we have more than one different behavior between Calc and Excel?

@thb (comment 10), @Balázs (comment 7), @Eike (comment 12) ?
Comment 14 Regina Henschel 2024-06-18 19:36:32 UTC
Created attachment 194802 [details]
XLOOKUP created in Excel 365

(In reply to ady from comment #13)
> I don't have access to Excel 2021 in order to create a sample case anew
> (from scratch) with Excel itself only (and without relying on Calc at all).
> @Regina? @Miguel?

The attached file is created in Excel 365, (Version 2405 Build 16.0.17628.20006)
Comment 15 ady 2024-06-18 21:02:00 UTC
(In reply to Regina Henschel from comment #14)
> Created attachment 194802 [details]
> XLOOKUP created in Excel 365
> 
> (In reply to ady from comment #13)
> > I don't have access to Excel 2021 in order to create a sample case anew
> > (from scratch) with Excel itself only (and without relying on Calc at all).
> > @Regina? @Miguel?
> 
> The attached file is created in Excel 365, (Version 2405 Build
> 16.0.17628.20006)

Thank you. I modified the file in Calc in order to focus on the problems described in this bug report. Unfortunately, between the limitations when using binary search – there should be some kind of info/error message, somewhere – the potential issues with hard recalculate, and the differences between Excel and Calc (which I cannot test by myself), I cannot provide a trustworthy report. I am leaving this for others to figure out.
Comment 16 m_a_riosv 2024-06-19 00:06:57 UTC
(In reply to Regina Henschel from comment #14)
> Created attachment 194802 [details]
> XLOOKUP created in Excel 365
> 
> (In reply to ady from comment #13)
> > I don't have access to Excel 2021 in order to create a sample case anew
> > (from scratch) with Excel itself only (and without relying on Calc at all).
> > @Regina? @Miguel?
> 
> The attached file is created in Excel 365, (Version 2405 Build
> 16.0.17628.20006)

Works different with Calc than with Excel.
And in Calc when the searching value is a text date, the format matters.
Comment 17 ady 2024-07-13 05:55:45 UTC
Created attachment 195275 [details]
Calc XLOOKUP test xlsx

I am attaching a new XLSX file created with Calc, to test this report.

Version: 24.8.0.1.0+ (X86_64) / LibreOffice Community
Build ID: 6312019426f45e0c4b49d7f29506148f93a5c2e1
CPU threads: 4; OS: Windows 10 X86_64 (10.0 build 19045); UI render: Skia/Raster; VCL: win
Locale: en-US (es_AR); UI: en-US
Calc: threaded

It contains 3 worksheets: one with notes, one with ascending-sorted data and one with descending-sorted data.

It does not include wildcards nor regex cases.

Beware of changing the data: part of the data table and some of the arguments for the formulas are direct strings, not references to other parts of the data table, in order to test this tdf#160893.

The results of the formulas of this testing file should be compared when opening the file with Calc vs opening the file with Excel (2021+) and executing recalculation.

Another possible test would be to recreate anew the data, cell formats, and formulas in Excel itself (completely independent of Calc) and comparing the results of that file in Excel against importing that recreated file in Calc.

This testing file does not cover "every-and-all" possible cases, but I believe it should be a starting point towards resolving this tdf#160893.