Bug 139612 - Option "Search criteria = and <> must apply to whole cells" doesn't affect to calculation as must.
Summary: Option "Search criteria = and <> must apply to whole cells" doesn't affect to...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.3 all versions
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisectRequest, regression
Depends on:
Blocks: Calculate Options-Dialog-Calc
  Show dependency treegraph
 
Reported: 2021-01-14 17:23 UTC by Roman Kuznetsov
Modified: 2021-01-15 11:00 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
File example (156.86 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-01-14 17:24 UTC, Roman Kuznetsov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Roman Kuznetsov 2021-01-14 17:23:31 UTC
Description:
Option "Search criteria = and <> must apply to whole cells" from https://help.libreoffice.org/7.1/en-US/text/shared/optionen/01060500.html doesn't affect to calculation as must.

Steps to Reproduce:
1. Open file from attach
2. Look at red B16 cell on Sheet3
3. It contains 0 instead 1

Actual Results:
Option "Search criteria = and <> must apply to whole cells" doesn't affect to calculation as must.

Expected Results:
Option "Search criteria = and <> must apply to whole cells" affects to calculation as must.


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 7.2.0.0.alpha0+ (x64)
Build ID: 94f6765d6ecc3145fa2d266231124003cf953118
CPU threads: 4; OS: Windows 6.1 Service Pack 1 Build 7601; UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: CL

repro also in 4.3 but not in 4.2 => regression
Comment 1 Roman Kuznetsov 2021-01-14 17:24:09 UTC
Created attachment 168885 [details]
File example
Comment 2 Mike Kaganski 2021-01-14 17:34:18 UTC
Repro with Version: 7.1.0.1 (x64)
Build ID: b585d7d90ab863bf29b2d110c174c0c2a98f3ee4
CPU threads: 12; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: en-US
Calc: CL

Affected are all functions that are described in ODF [1] to depend on HOST-SEARCH-CRITERIA-MUST-APPLY-TO-WHOLE-CELL host-defined property, or to take parameters of type Criterion or Criteria:

Database Functions
AVERAGEIF
AVERAGEIFS
COUNTIF
COUNTIFS
HLOOKUP
LOOKUP
MATCH
VLOOKUP
SUMIF
SUMIFS

Seems that Advanced Filter is affected, too.

[1] http://docs.oasis-open.org/office/OpenDocument/v1.3/OpenDocument-v1.3-part4-formula.html
Comment 3 Igor 2021-01-14 19:39:22 UTC
Advanced or Standard filters work correctly. The database functions and a number of others mentioned above ones ignore the disabled option "Search criteria = and <> must apply to whole cells" until the search criterion is identified using any regex identifier, for example, a pair of parentheses.
In other words, the criterion should not be simple, otherwise, even if the option "Search criteria = and <> must apply to whole cells" is disabled, it is perceived as requiring an exact match.
Comment 4 Igor 2021-01-14 19:48:49 UTC
Will find "red" in the line "Fred" if you specify such a criterion:: "r.d", "red.*", "(red)" or even "()red". But it does not work: "red", because it is perceived as an exact match, contrary to the meaning of the option.
Comment 5 Mike Kaganski 2021-01-15 06:55:45 UTC
(In reply to Igor from comment #3)
> Advanced or Standard filters work correctly.

Please clarify when you assert something; at least provide the version you test, and also listing the steps you tried would be good.

Specifically, with the version mentioned in comment 1, doing this:

1. Select A1:E10
2. Data->More Filters->Advanced Filter
3. Read Filter Criteria From -> Лист3!$A$13:$E$14 -> OK

results in rows 2 to 10 all hidden, i.e. it filters out all rows. However, this works (keeps row 3 shown) in Version: 5.0.0.5 (x64)
Build ID: 1b1a90865e348b492231e1c451437d7a15bb262b
Locale: ru-RU (ru_RU), which means there's a way to bibisect this specific part of the problem (and possibly find the relevant code).
Comment 6 Mike Kaganski 2021-01-15 07:10:14 UTC
Steps from comment 5 work OK (i.e., keep row 3 shown) in Version: 6.0.0.3 (x64)
Build ID: 64a0f66915f38c6217de274f0aa8e15618924765
CPU threads: 12; OS: Windows 10.0; UI render: GL; 
Locale: ru-RU (ru_RU); Calc: CL

but doesn't work in Version: 6.1.0.3 (x64)
Build ID: efb621ed25068d70781dc026f7e9c5187a4decd1
CPU threads: 12; OS: Windows 10.0; UI render: GL; 
Locale: ru-RU (ru_RU); Calc: CL

Bibisection in the range would be useful. It doesn't mean that the bug with spreadsheet functions starts in v.6.0; strictly speaking, comment 5 describes a separate (related, but different) bug. Bibisection of the filter breakage in the 6.1 range would still benefit this.
Comment 7 Mike Kaganski 2021-01-15 07:30:22 UTC
(In reply to Mike Kaganski from comment #6)

Bibisected to https://git.libreoffice.org/core/+/a953fa1c0f6a40a08859570516c511f3a8410a35
> author Michael Meeks <michael.meeks@collabora.com> Sun Apr 08 00:24:15 2018 +0100
> committer Michael Meeks <michael.meeks@collabora.com> Tue Apr 10 22:28:37 2018 +0200
> 
> vlookup - optimize SC_EQUAL and NOT_EQUAL.
> 
> Also don't accept partial matches ie. CONTAINS != EQUAL,
> for VLOOKUP even if document option "search criteria =, <>
> for whole cells" is turned off.
> 
> This also adds a new spreadsheet test file vlookup2.fods
> with the option "search criteria =,<> for whole cells" turned off,
> with VLOOKUP test cases that ensures that partial matches are not
> accepted.
Comment 8 Mike Kaganski 2021-01-15 07:42:53 UTC
Michael,
the commit identified in comment 7 looks strange, directly contradicting ODF standard [1]. Could you please describe what was the rationale? (The commit message does not explain whys, only what was done.) Also the commit doesn't tell it intends to affect Advanced Filter - is it an unwanted regression?

OTOH, the standard only says "The values returned *may* vary depending upon the ... HOST-SEARCH-CRITERIA-MUST-APPLY-TO-WHOLE-CELL" (emphasis on "may"). Also VLOOKUP is not documented to take "Criterion", only "Any" (which means that Criterion and Criteria are among possible values?). And I don't see a description of the filter in the standard (it might be there though, because filtering allows to store the criteria in the file; I just didn't spend time on investigating it). Eike: do you have an opinion on this?

[1] http://docs.oasis-open.org/office/OpenDocument/v1.3/OpenDocument-v1.3-part4-formula.html#__RefHeading__1018436_715980110
Comment 9 Michael Meeks 2021-01-15 11:00:05 UTC
Very little time to look at this. Looks like I created a unit test sheet for this behavior, as such I imagine that when I created it we spent a chunk of time analyzing what the right behavior is - particularly as regards interoperability.

This code is at/near the center of many inner-loops and is intensely performance critical:

commit afa91e423c2073cec281477f2154291c6d4f739d
Author: Dennis Francis <dennis.francis@collabora.co.uk>
Date:   Sat Apr 7 17:13:42 2018 +0530

    avoid SharedString copy assignment

commit 76bc1a81d089d9f66639c118c4063d87e4422684
Author: Michael Meeks <michael.meeks@collabora.com>
Date:   Thu Apr 5 16:15:32 2018 +0100

    tdf#115490 - avoid transliteration by using SharedString.
    
    Do this only for case insensitive matching for now; optimizing vlookup
    nicely - for me saves 35% of the compute time for 10k rows.

commit 009a326d78fb62a80f9631844af324d0294710b6
Author: Michael Meeks <michael.meeks@collabora.com>
Date:   Thu Apr 5 14:01:05 2018 +0100

    query entry - preparatory pure re-factor.
    
    No need to check lengths for ENDS_WITH - we get the right offsets
    to start with before indexOf, and separate nStrPos for later.


All of these are related. Carefully using the SharedString to do the comparisons was a huge win where we can do that for case-insensitive comparison.

What is the correct behavior: I'd look at interoperability first, common sense next, and then the ODF spec. Someone needs to do that analysis though I'm afraid I don't have time.