Bug 159544 - Function COUNTIFS empty cells are counted up to the nearest filled one, EDITING
Summary: Function COUNTIFS empty cells are counted up to the nearest filled one, EDITING
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.4.0.3 release
Hardware: All All
: high major
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, regression
: 159858 (view as bug list)
Depends on:
Blocks: Calc-Function
  Show dependency treegraph
 
Reported: 2024-02-03 17:41 UTC by Username
Modified: 2024-03-02 15:53 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Function COUNTIF empty cells are counted up to the nearest filled one (56.34 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2024-02-03 17:44 UTC, Username
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Username 2024-02-03 17:41:44 UTC
Description:
Ignores formula results as an empty string

Steps to Reproduce:
1.Create new spreadsheat
2.I add data on the first page, but on the second page I add formulas
3.The cell uses the function =COUNTIFS($Sheet1.D:D;"";$Sheet1.A:A;A6;$Sheet1.B:B;B6;$Sheet1.C:C;C6) which incorrectly counts empty cells
4. In version 7.3.7.2 it calculates correctly

Actual Results:
9

Expected Results:
18


Reproducible: Always


User Profile Reset: Yes

Additional Info:
Version: 7.6.4.1 (X86_64) / LibreOffice Community
Build ID: e19e193f88cd6c0525a17fb7a176ed8e6a3e2aa1
CPU threads: 24; OS: Windows 10.0 Build 19045; UI render: default; VCL: win
Locale: ru-RU (ru_RU); UI: en-US
Calc: threaded
Comment 1 Username 2024-02-03 17:44:52 UTC
Created attachment 192374 [details]
Function COUNTIF empty cells are counted up to the nearest filled one
Comment 2 ady 2024-02-03 19:35:33 UTC
I can confirm the problem under the following conditions:

* The formula (COUNTIFS in this case) is introduced on a different worksheet than the arguments.

* Introducing a similar formula on the same worksheet as where (both Range and Criteria) arguments are located, the result is correct.

* If either of the arguments is located on a different worksheet than the formula, the result is incorrect.

* When using LO 7.3.1.3 and older, a Hard Recalculate displays the correct result (Sheet2.D6).

* A Hard Recalculate does not update the incorrect result when testing with LO 7.4.3.2 and newer versions, including recent 24.8 alpha. > bug.

* Showing all rows and re-sorting the Range area (for each and every column) will trigger the correct result in the attached file.

* No change when Disabling or Enabling OpenCL; in both cases the incorrect result is displayed in recent LO 24.8 alpha.


This is probably a negative consequence of the attempted performance enhancement introduced in LO 7.4:

* Improved performance of functions such as COUNTIF, SUMIFS and VLOOKUP, especially with unsorted data tdf#139444 tdf#144777 tdf#146546 (Luboš Luňák, Collabora)

IIRC, this is not the first report about negative effects of that change (if indeed this problem is about the same issue).

CC'ing Eike.
Comment 3 ady 2024-02-03 20:55:12 UTC
In attachment 192374 [details] worksheet Sheet2 cell D6, the formula is:

=COUNTIFS($Sheet1.D:D;"";$Sheet1.A:A;A6;$Sheet1.B:B;B6;$Sheet1.C:C;C6)

Changing that to:

=COUNTIFS($Sheet1.A:A;A6;$Sheet1.B:B;B6;$Sheet1.C:C;C6;$Sheet1.D:D;"")


(i.e. moving Range1 and Criteria1 to the last pair of arguments)

generates the correct result in a recent LO 24.8 alpha.

I'm not sure why the order of the pair of arguments would make any difference (when the formula is in a different worksheet), but it does.
Comment 4 Username 2024-02-03 21:18:25 UTC
I moved Range1 and Criteria1 and everything was calculated correctly. Thank you!
Comment 5 ady 2024-02-03 21:25:57 UTC
(In reply to Username from comment #4)
> I moved Range1 and Criteria1 and everything was calculated correctly. Thank
> you!

Just to be clear, that does not solve the bug. A bug is still present, since LO 7.4 and up to a recent 24.8 alpha.
Comment 6 ady 2024-02-03 22:55:34 UTC
tdf#146790 and tdf#151958 seem somewhat related to this, especially when Eike pushed "Disable binary search on sorted cache for current releases" for the later.

Quoting Eike:
"
I disabled it at a central place that is called by all affected functions to decide whether to use the sorted cache or direct search.
"

IDK whether something else was done to re-introduce the potential better performance at a later time.

At any rate, there is (still) a bug in LO 7.6 (at least) and up to a recent 24.8 alpha.
Comment 7 raal 2024-02-04 00:58:59 UTC
This seems to have begun at the below commit in bibisect repository/OS linux-64-7.4.
Adding Cc: to Luboš Luňák ; Could you possibly take a look at this one?
Thanks

There are only 'skip'ped commits left to test.
The first bad commit could be any of: 58e95563e5247039066c70f783e3f8b8139dce38 source e3d3adbcde43965b517473387acda81e53e248ed 1bb345f1e205d1ceb2281c06cc2fe7e78dfbe9a6 source 7674399aac661eb503d7badc53b9a4d68bd9839d
We cannot bisect more!

135047: fix COUNTIFS when matching empty cells | https://gerrit.libreoffice.org/c/core/+/135047
135049: try to range-reduce even COUNTIFS if not matching empty cells | https://gerrit.libreoffice.org/c/core/+/135049


Also recalculation is much more slower.
Comment 8 Buovjaga 2024-02-27 16:38:33 UTC
*** Bug 159858 has been marked as a duplicate of this bug. ***
Comment 9 Buovjaga 2024-02-27 16:39:03 UTC
In bug 159858 I got 7674399aac661eb503d7badc53b9a4d68bd9839d as the exact result.
Comment 10 ady 2024-02-27 16:54:26 UTC
(In reply to Buovjaga from comment #9)
> In bug 159858 I got 7674399aac661eb503d7badc53b9a4d68bd9839d as the exact
> result.

There must be 2 reasons for the bug(s), instead of just 1 unique commit. Please see attachment 192832 [details] in tdf#159858 that proves it.
Comment 11 Buovjaga 2024-02-27 17:15:34 UTC
(In reply to ady from comment #10)
> (In reply to Buovjaga from comment #9)
> > In bug 159858 I got 7674399aac661eb503d7badc53b9a4d68bd9839d as the exact
> > result.
> 
> There must be 2 reasons for the bug(s), instead of just 1 unique commit.
> Please see attachment 192832 [details] in tdf#159858 that proves it.

Plot twist: if you do hard recalc twice with attachment 192832 [details], you get

3
3
4
Comment 12 Buovjaga 2024-02-27 17:16:41 UTC
(In reply to Buovjaga from comment #11)
> (In reply to ady from comment #10)
> > (In reply to Buovjaga from comment #9)
> > > In bug 159858 I got 7674399aac661eb503d7badc53b9a4d68bd9839d as the exact
> > > result.
> > 
> > There must be 2 reasons for the bug(s), instead of just 1 unique commit.
> > Please see attachment 192832 [details] in tdf#159858 that proves it.
> 
> Plot twist: if you do hard recalc twice with attachment 192832 [details],
> you get
> 
> 3
> 3
> 4

...in commit 7674399aac661eb503d7badc53b9a4d68bd9839d that is
Comment 13 ady 2024-02-27 17:31:43 UTC
(In reply to Buovjaga from comment #11)

> Plot twist: if you do hard recalc twice with attachment 192832 [details],
> you get
> 
> 3
> 3
> 4

The double hard recalc is only needed in certain versions (e.g. 7.4.3.2) / commits, but not in later versions.

IDK what's the reason for the need of a multiple hard recalc and why some formulas' results change only after the second one, but it would still suggest that there is not 1 but at least 2 items influencing the results of these formulas. How these, in term, influence the identification of the problem(s) and respective solution(s) for each, IDK either.
Comment 14 Buovjaga 2024-02-27 17:41:07 UTC
(In reply to ady from comment #13)
> (In reply to Buovjaga from comment #11)
> 
> > Plot twist: if you do hard recalc twice with attachment 192832 [details],
> > you get
> > 
> > 3
> > 3
> > 4
> 
> The double hard recalc is only needed in certain versions (e.g. 7.4.3.2) /
> commits, but not in later versions.
> 
> IDK what's the reason for the need of a multiple hard recalc and why some
> formulas' results change only after the second one, but it would still
> suggest that there is not 1 but at least 2 items influencing the results of
> these formulas. How these, in term, influence the identification of the
> problem(s) and respective solution(s) for each, IDK either.

Ok, I checked with linux-64-7.6 and the need for two recalcs stopped with e7897ce47e415e46eb36687f911a9a4b09ca6b8a
crashtesting: crash in ScInterpreter::ScCountIfs

It mentions "bodge this to fill in 0 for missing ranges"
Comment 15 ady 2024-03-02 15:53:55 UTC
(In reply to Buovjaga from comment #14)

> Ok, I checked with linux-64-7.6 and the need for two recalcs stopped with
> e7897ce47e415e46eb36687f911a9a4b09ca6b8a
> crashtesting: crash in ScInterpreter::ScCountIfs
> 
> It mentions "bodge this to fill in 0 for missing ranges"

I hope there is some preventive test for such case now, so such thing won't ever happen again. Users should be able to trust that the results are updated after a/one Hard Recalculation, always.

At any rate, we have Calc functions generating wrong results such as this tdf#159544 since LO 7.4 (more than a year and a half ATM), and without users knowing it. We have enough reports about these failures.