Bug Hunting Session
Bug 60345 - SUMIF does not sum properly accounting for use of wildcard "*" in the search criteria.
Summary: SUMIF does not sum properly accounting for use of wildcard "*" in the search ...
Status: RESOLVED DUPLICATE of bug 72196
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.0.0.3 release
Hardware: Other All
: high critical
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 89114 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-02-06 02:16 UTC by Tim Richardson
Modified: 2015-02-04 20:47 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
spreadsheet (15.85 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2013-02-06 02:16 UTC, Tim Richardson
Details
Modified sample file. (16.12 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2013-02-07 00:59 UTC, m.a.riosv
Details
Shows improper interpretation of wildcard in SUMIF criteria. (27.50 KB, application/vnd.oasis.opendocument.spreadsheet)
2015-01-12 16:22 UTC, klsu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Richardson 2013-02-06 02:16:29 UTC
Created attachment 74267 [details]
spreadsheet

this formula
=sumif(d1:d9999,"paypal*",e1:e9999)
should sum values in e when text in d begins paypal

but it doesn't. it fails to match.

(my expectation of behaviour is based on Excel)

Attached is a spreadsheet showing the problem. In LibreOffice 4.0 rc3 on both Windows and Mac, the total on the sheet totals is 0.
When the spreadsheet is openened into Excel, the toal is 9120.9
Comment 1 Tim Richardson 2013-02-06 02:19:30 UTC
This is unaffected by the Enable regular expressions in formulas preference
Comment 2 Tim Richardson 2013-02-06 02:21:42 UTC
I suggest see also https://bugs.freedesktop.org/show_bug.cgi?id=35636
Comment 3 Tim Richardson 2013-02-06 02:28:04 UTC
Actually, this is controlled by a preference setting.
If I untick Search Criteria = and <> must apply to whole cells
it works.

I don't know if this is bug or a surprise.
Comment 4 m.a.riosv 2013-02-07 00:59:04 UTC
Created attachment 74313 [details]
Modified sample file.

Hi Tim,
regular expressions in calc are not the same wildcard than excel. They are more complex and powerful but not so simple as in excel. Look up in the help for regular expressions.

Also remember that by default open an xls file by default disable regular expressions option.

If you want your formula working, enable regular expressions and substitute "*" with ".*"
=sumif(d1:d9999,"paypal.*",e1:e9999)

One trick, that I have used time ago is enter the wildcard in a cell and reference this cell in the expression.
B3: .* (* in excel)
=sumif(d1:d9999,"paypal"&$B$3,e1:e9999)
Attached your file with a little modification.

It is not a bug, is how it works.
Comment 5 Tim Richardson 2013-02-07 01:12:38 UTC
I'm not convinced that it is not a bug.
The problem is not affected by the Regular Expression preference. On or 
Off, it makes no difference.
My comment above about REs is a red herring. My apologies.

The problem is the setting "Search Criteria = and <> must apply to whole 
cells". It affects the way sumif works, and I don't think it should.
By default this setting is on, and in that case a SUMIF formula behaves 
very differently in Excel and LibreOffice: in LibreOffice it doesn't 
match anything.
Turn the setting off, and the wildcard matches the same as Excel.

There is no equivalent setting in Excel.

In summary, I don't think wildcard matching in SUMIF should be affected 
by the preference setting "Search Criteria = and <> must apply to whole 
cells". But it is. If LibreOffice is supposed to work like this, then 
the preference should be better named because it seems obscure to me.
Comment 6 Tim Richardson 2013-02-07 02:06:30 UTC
I reopened this report because it was closed based on a Regular Expression work around. However, this is not the bug I am reporting: my bug is based on Excel compatability, and if Excel is the reference implementation of the SUMIF formula, then possibly this issue is a standards compliance bug. (Comments of mine referring to Regular Expressions were misleading. RE solutions are irrelevant because they don't work in Excel). 

Is the "Search Criteria = and <> must apply to whole cells" supposed to affect the use of Excel-style wildcards in SUMIF. 
And if so, how can users be better informed of this? 

This issue can cause big differences when the same spreadsheet is opened in Excel and Libreoffice, which I think is undesirable behaviour.
Comment 7 GerardF 2013-02-07 09:56:57 UTC
Excel can't handle REGEX in formula, this leads to have false results when opening with Excel a sheet created with LibreOffice.
You may report a bug to Microsoft :)
Comment 8 Tim Richardson 2013-02-07 13:06:50 UTC
God,
how I wished I had never mentioned regex.
[I'm a H1-average Computer Science graduate and I know what a regex is. Not Software Engineering, but Computer Science. We did the 1930s theory of regular exprssion language.  I love that OO/LO uses regex. I also appreciate that for compatability, I can turn it off. However, this is an instance when it is in my opinion broken]

Excel does not do regex,we all know that. But for the database functions, it supports wildcard matching. 

In the OpenFormula standards, you will see that the database functions, such as SUMIF, accept a "criteria". 

https://www.oasis-open.org/committees/download.php/16826/openformula-spec-20060221.html#Criteria

Unfortunately, exactly what a "criteria" is is not well defined. 
As comments to the standard show me, there are two families of "criteria".
One is regex, and one, which 95% of the world uses, is the Excel family, referred to as the "wildcard" family. 

see https://lists.oasis-open.org/archives/office/201003/msg00434.html
which quotes someone as saying "Need to clearly note that table:use-regular-expressions and table:use-wildcards are mutually exclusive (at most one can be true)."

Now, if regex is turned off in Libre Office preferences, you may think that use-wildcards is activated. This at least is my expectation. 

So why does it not work in SUMIF? 

I've done "regression testing" and this is not a new problem in OO/LO. However, I still think it is a bug since for all intents and purposes Excel is a reference implementation of the definition of "criteria". 

I'd love for someone to answer my question without referring to regular expressions (god bless them).
Comment 9 klsu 2015-01-12 16:22:17 UTC
Created attachment 112133 [details]
Shows improper interpretation of wildcard in SUMIF criteria.

This attachment is an expansion of Marco A's example spreadsheet submitted for 35636. HOWEVER, based on this:
https://wiki.openoffice.org/wiki/Documentation/OOo3_User_Guides/Calc_Guide/Using_regular_expressions_in_functions

I cannot yet say whether this is a bug or not, because it appears that LibreOffice Calc uses wildcards in SUMIF, COUNTIF, MATCH, SEARCH, LOOKUP, HLOOKUP, VLOOKUP, DCOUNT, DCOUNTA, DSUM, DPRODUCT, DMAX, DMIN, DAVERAGE, DSTDEV, DSTDEVP, DVAR, DVARP, DGET in a way that is so cumbersome that it will take me time I don't have right now to figure it out. What I do note is that the wildcard "*" doesn't appear to do anything in SUMIF unless Tools > Options > LibreOffice Calc > Calculate > 
Search criteria = and <> must apply to whole cells
...AND...
Enable regular expressions in formulas
...are checked; and if both are checked, the wildcard is not interpreted properly (as illustrated in the attached spreadsheet. When I get time to play with this in a daily build of LO Calc, I'll post more.
Comment 10 klsu 2015-01-12 16:42:40 UTC
Incidentally, it is inappropriate for the settings:
Search criteria = and <> must apply to whole cells
...and...
Enable regular expressions in formulas
...to be in Tools > Options > LibreOffice Calc > Calculate, because that makes them pretty much global. If a person has them turned off and then turns them on because of one spreadsheet, all that person's other spreadsheets could potentially no longer work properly. If a pharmacist or an engineer is using a spreadsheet to calculate drug doses or structural member sizes, that change could kill someone. That is unacceptable in any scenario. Remember, wrong answers are worse than crashes and system slowdowns. If wildcards are not a per-formula feature, they are not be usable/useful.

If something can cause wrong or unexpected answers but is not a bug, then it must be intentional, but it is still a problem. Whether this is a design problem or a bug remains to be seen, but the problem has not been resolved and needs to be.
Comment 11 klsu 2015-01-12 16:44:21 UTC
Why doesn't Bugzilla allow someone to edit their own comments (e.g., when a typo is found)?
Comment 12 Urmas 2015-02-04 13:19:33 UTC
*** Bug 89114 has been marked as a duplicate of this bug. ***
Comment 13 m.a.riosv 2015-02-04 20:47:23 UTC
Enhancement request.
https://bugs.documentfoundation.org/show_bug.cgi?id=72196

*** This bug has been marked as a duplicate of bug 72196 ***