Bug 151958 - The COUNTIF and COUNTIFS functions mis-count on mixed numeric and text data and query by string
Summary: The COUNTIF and COUNTIFS functions mis-count on mixed numeric and text data a...
Status: ASSIGNED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.4.0.3 release
Hardware: All All
: medium normal
Assignee: Eike Rathke
URL:
Whiteboard: target:7.6.0 target:7.4.7.2 target:7....
Keywords: bibisected, bisected, regression
Depends on:
Blocks: Cell-Formula
  Show dependency treegraph
 
Reported: 2022-11-08 14:16 UTC by Russell Wilkinson
Modified: 2024-02-23 17:43 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot (99.60 KB, image/png)
2022-11-08 14:16 UTC, Russell Wilkinson
Details
Screenshot (#2) (156.30 KB, image/png)
2022-11-08 14:18 UTC, Russell Wilkinson
Details
Spreadsheet containing countif error (32.40 KB, application/vnd.oasis.opendocument.spreadsheet)
2022-11-11 11:20 UTC, Russell Wilkinson
Details
Modified sample file (35.55 KB, application/vnd.oasis.opendocument.spreadsheet)
2022-11-11 13:38 UTC, m_a_riosv
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Russell Wilkinson 2022-11-08 14:16:30 UTC
Created attachment 183471 [details]
Screenshot

The countif function appears to count the first occurrence only of a string and hence seems to ignore the remaining cells that contain the same string. This seems to happen at random. The attached files indicate this bug whereby adding an extra space between two words produces the correct result.
Comment 1 Russell Wilkinson 2022-11-08 14:18:29 UTC
Created attachment 183472 [details]
Screenshot (#2)
Comment 2 m_a_riosv 2022-11-08 23:23:49 UTC
Sorry, but what do you expect that someone can view what is happening?, the images even don't show the formula.

If you attach a sample file, is the only way to see if there is a bug or possible, something about the formula, or regular expression/wildcards, etc.
Comment 3 Russell Wilkinson 2022-11-11 11:20:54 UTC
Created attachment 183538 [details]
Spreadsheet containing countif error
Comment 4 Russell Wilkinson 2022-11-11 11:22:50 UTC
The sample spreadsheet containing the countif is attached (previously sent on 9 Nov).
Comment 5 m_a_riosv 2022-11-11 13:38:15 UTC
Created attachment 183539 [details]
Modified sample file

COUNTIF fails on G13 and G14, what it's strange if you change in the formula F13 by $F13, then the result it's fine.
COUNTIFS and with COUNTIF as array, the result it's fine.

Selecting column A, applying [Ctrl+M] to clear the format, and a Menu/Data/Calculate/ Recalculate hard, results are good.

So seems the format it's affecting the results for COUNTIF

Reproducible.
Version: 7.5.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 3c0be5564afe1b9cc843a49aba88b72af74c43ab
CPU threads: 4; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: es-ES (es_ES); UI: en-US Calc: CL threaded

Last working fine for me.
Version: 7.3.7.2 (x64) / LibreOffice Community
Build ID: e114eadc50a9ff8d8c8a0567d6da8f454beeb84f
CPU threads: 4; OS: Windows 10.0 Build 19045; UI render: Skia/Vulkan; VCL: win
Locale: es-ES (es_ES); UI: es-ES Calc: CL
Comment 6 m_a_riosv 2022-11-11 14:23:49 UTC
Cell's formats affecting calculations, are those in column A with :
NN D "de" MMMM "de" AAAA
format.
Comment 7 raal 2022-11-11 19:48:18 UTC
This seems to have begun at the below commit.
Adding Cc: to Luboš Luňák; Could you possibly take a look at this one?
Thanks

First test file - good G4=17, then after this commit G4=171, in master G4=104
 66e92f371fb936119392a4841c8800ed228a8397 is the first bad commit
commit 66e92f371fb936119392a4841c8800ed228a8397
Author: Jenkins Build User <tdf@pollux.tdf>
Date:   Wed May 11 13:33:42 2022 +0200

    source 5b189abc13d4a9e408c82298e4ede0fdf505002d

https://git.libreoffice.org/core/+/5b189abc13d4a9e408c82298e4ede0fdf505002d
Comment 8 ady 2023-04-26 00:30:42 UTC
Recalculate Hard is failing, not the countif() function itself.

This works OK in 7.6.alpha, and also in older 7.4.3.2 (this is not a typo, despite the current Version field set to a prior version), but fails in current 7.4.6.2.

I have not tested 7.5.

This "must" be resolved for the next 7.4.7 before its release (I can only hope).

This should be set to "Major".
Comment 9 ady 2023-04-26 01:15:32 UTC
(In reply to ady from comment #8)
> Recalculate Hard is failing, not the countif() function itself.

OTOH, bug 146790 is also about COUNTIF() and it fails in a similar way as this bug 151958, in the same versions for me.

CC'ing Eike.
Comment 10 Eike Rathke 2023-04-26 14:03:49 UTC
The by string query of binary search on sorted cache for COUNTIF() is broken for mixed types like here the numeric date cells and text cells. It even fails an assert in a debug build.

For 7.4.7 and 7.5.3 (if I'm fast enough to get that in) it might be best to just use the direct implementation instead (if that still works after all these changes), and see if we can fix the sorted cache query for later releases.
Comment 11 Eike Rathke 2023-04-26 18:58:51 UTC
(In reply to ady from comment #8)
> Recalculate Hard is failing, not the countif() function itself.
Not true. Both COUNTIF() and COUNTIFS() produce wrong results. Hard recalc only triggers their recalculation.

> This works OK in 7.6.alpha,
No, it doesn't.

> and also in older 7.4.3.2
It fails even in 7.4.0.3
Comment 12 Commit Notification 2023-04-26 23:30:45 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/9a555d79b3b00793edf1d51a8a7c76b723cc436d

Resolves: tdf#151958 Disable binary search on sorted cache for current releases

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 ady 2023-04-27 04:24:47 UTC
(In reply to Eike Rathke from comment #11)
> (In reply to ady from comment #8)
> > Recalculate Hard is failing, not the countif() function itself.
> Not true. Both COUNTIF() and COUNTIFS() produce wrong results. Hard recalc
> only triggers their recalculation.
> 
> > This works OK in 7.6.alpha,
> No, it doesn't.
> 
> > and also in older 7.4.3.2
> It fails even in 7.4.0.3

Before I posted my comment 8, I repeated the steps because they didn't make sense.

For example, using attachment 183539 [details] from comment 5, cell G14 looked as 41, and recalculate hard wasn't updating the value, whereas reload did.

After reading your comment 11, I went and reset my LO profile(s) for the versions of LO I tested with, and repeated the steps, now with a different result using that same attachment 183539 [details].

OTOH, I am still having problems with Recalculate Hard not working.

In attachment 183538 [details] from comment 3, cell G7 shows 82 and recalculate hard is not updating the value. But, if I add the same exact formula from G7 in another cell, I do obtain the correct result in that cell, which I should be seeing in G7. Even then, recalculate hard does not update the original values, while the same newly-created functions are OK. Currently, Reload is not solving that either.

That's why I said that Recalculate Hard is not working, while the countif() function seems to be.

So now I don't know what is wrong in my system, but something is.

I have not yet tested a new LO build with the patch from comment 12.
Comment 15 ady 2023-04-27 07:16:36 UTC
(In reply to Commit Notification from comment #12)
> Eike Rathke committed a patch related to this issue.
> It has been pushed to "master":
> 9a555d79b3b00793edf1d51a8a7c76b723cc436d

Tested using attachments from comment 3 and comment 5 with:

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 9a555d79b3b00793edf1d51a8a7c76b723cc436d
CPU threads: 4; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: en-US (es_AR); UI: en-US
Calc: CL threaded
Built: 2023-04-27

After Recalculate Hard, the resulting values are correct.
Comment 16 Commit Notification 2023-04-27 07:24:30 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/054455d259037c97b2b67e738b6655e33ddfe126

Resolves: tdf#151958 Disable binary search on sorted cache for current releases

It will be available in 7.5.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Commit Notification 2023-04-27 07:24:33 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/01ce3266484451e0984ed1ba07944745f44fc9e0

Resolves: tdf#151958 Disable binary search on sorted cache for current releases

It will be available in 7.4.8.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Eike Rathke 2023-04-27 10:45:20 UTC
(In reply to ady from comment #14)
> For example, using attachment 183539 [details] from comment 5, cell G14
> looked as 41, and recalculate hard wasn't updating the value, whereas reload
> did.
Didn't for me. It's 41 and reload does not change the value (I have not activated forced recalc upon loading documents), but recalc did.

> In attachment 183538 [details] from comment 3, cell G7 shows 82 and
> recalculate hard is not updating the value. But, if I add the same exact
> formula from G7 in another cell, I do obtain the correct result in that
> cell, which I should be seeing in G7. Even then, recalculate hard does not
> update the original values, while the same newly-created functions are OK.
> Currently, Reload is not solving that either.
Same for me. The position-dependent varying result is a consequence of the sorted cache search being broken.

All with 7.5.4.0.0+ without the new change.
Comment 19 ady 2023-04-27 11:46:06 UTC
(In reply to Eike Rathke from comment #18)
> (In reply to ady from comment #14)
> > For example, using attachment 183539 [details] from comment 5, cell G14
> > looked as 41, and recalculate hard wasn't updating the value, whereas reload
> > did.
> Didn't for me. It's 41 and reload does not change the value (I have not
> activated forced recalc upon loading documents), but recalc did.

As I wrote in comment 14, "something" was wrong in my system because the behavior I experienced was different than others'. After resetting profiles, at least some of the difference is gone. I'll keep checking in the coming days. Hopefully it won't happen again.

The important thing now is what I reported in comment 15 with the new 7.6.alpha: seems at least working after recalculate hard.

Evidently all this incorrect behavior came from (quoting release notes for 7.4):

* Improved performance of functions such as COUNTIF, SUMIFS and VLOOKUP, especially with unsorted data tdf#139444 tdf#144777 tdf#146546 

So perhaps (those) other functions also need some (urgent) review(?).
Comment 20 Commit Notification 2023-04-27 11:54:03 UTC
Xisco Fauli committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e5498d5323b2e9f1c4760d780a1a5542fbe9a4b9

tdf#151958: sc_subsequent_filters_test4: Add unittest

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Commit Notification 2023-04-27 14:59:36 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-7-4-7":

https://git.libreoffice.org/core/commit/d65a5e8d758e6a955076e6d34a57b4b8cd33a0f6

Resolves: tdf#151958 Disable binary search on sorted cache for current releases

It will be available in 7.4.7.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 22 Commit Notification 2023-04-27 14:59:39 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-7-5-3":

https://git.libreoffice.org/core/commit/02b4d5027b0a48c192ce2bf35c69abe347eb3601

Resolves: tdf#151958 Disable binary search on sorted cache for current releases

It will be available in 7.5.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 23 Eike Rathke 2023-04-27 15:43:14 UTC
(In reply to ady from comment #19)
> Evidently all this incorrect behavior came from (quoting release notes for
> 7.4):
> 
> * Improved performance of functions such as COUNTIF, SUMIFS and VLOOKUP,
> especially with unsorted data tdf#139444 tdf#144777 tdf#146546 
Yes.

> So perhaps (those) other functions also need some (urgent) review(?).
I disabled it at a central place that is called by all affected functions to decide whether to use the sorted cache or direct search.