Bug 90997 - COUNTIF sometimes compares a cell as not being equal to itself
Summary: COUNTIF sometimes compares a cell as not being equal to itself
Status: RESOLVED DUPLICATE of bug 67026
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.2.7.2 release
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-01 17:01 UTC by Dane Maslen
Modified: 2015-05-02 05:55 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Spreadsheet that demonstrates the bug (43.65 KB, application/vnd.oasis.opendocument.spreadsheet)
2015-05-01 17:01 UTC, Dane Maslen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dane Maslen 2015-05-01 17:01:26 UTC
Created attachment 115244 [details]
Spreadsheet that demonstrates the bug

The formulae

=COUNTIF(A1:A1,"<"&A1)
=COUNTIF(A1:A1,">"&A1)

sometimes return the value 1 while the formula

=COUNTIF(A1:A1,"="&A1)

sometimes returns the value 0.

Formulae like

=COUNTIF(A1:A1,">="&A1)
=COUNTIF(A1:A1,"<="&A1)
=COUNTIF(A1:A1,"<>"&A1)

similarly sometimes return incorrect values.

Clearly no one would actually want to make use of the above formulae, but the bug also means that formulae like

=COUNTIF(A1:A40,">"&A5)

fail to return the correct answer.  For example if A5 is the second largest number in the range A1:A40, the above formula will usually return 1 (the correct answer) but will occasionally return 2 because A5 has been counted as being greater than itself.

When I looked at existing bug reports, it at first seemed to me that this might be a duplicate of Bug 78447, but that was eventually closed as worksforme on 4.2.7.2, Linux whereas this bug definitely exists on 4.2.7.2, Linux.  For the record it also existed on 3.5.7.2, Linux.

I've attached a spreadsheet that demonstrates the bug.

A1:A230 are random numbers.

B1 uses COUNTIF to count the number of times that cells in the range A1:A1 are less than A1 (the expected answer is of course 0).  Similarly through to B230.

C1 uses COUNTIF to count the number of times that cells in the range A1:A1 are equal to A1 (the expected answer is of course 1).  Similarly through to C230.

D1 uses COUNTIF to count the number of times that cells in the range A1:A1 are greater than A1 (the expected answer is of course 0).  Similarly through to B230.

B232, C232 and D232 are counts of the number of times the above uses of COUNTIF produced the wrong answer.  Occasionally all three counts will be zero.  If so, using F9 to force a recalculation usually produces non-zero values within one or two iterations.

With some ingenuity it is possibile to work around the bug.  For example although

=COUNTIF(A1:A40,">"&A5)

will sometimes give the wrong answer

=COUNTIF(A1:A40,">"&A5)-COUNTIF(A5:A5,">"&A5)

will always give the correct one.

Finally I'll speculate that the bug might be caused by the use of an intermediate variable with inappropriate precision, though I have to confess that I would expect that to produce incorrect results more frequently than I observe.
Comment 1 m_a_riosv 2015-05-02 00:03:32 UTC
Hi Dane, thanks for reporting.

I think the issue is in relation with the limits of the precision , please see the bugs:

https://bugs.documentfoundation.org/show_bug.cgi?id=50299 
https://bugs.documentfoundation.org/show_bug.cgi?id=67026
https://bugs.documentfoundation.org/show_bug.cgi?id=37923

Rounding the RAND() value to no more than 15 decimals, seems solve the issue in your sample file.

*** This bug has been marked as a duplicate of bug 67026 ***
Comment 2 Dane Maslen 2015-05-02 05:55:11 UTC
I agree that it's a precision issue, but not in the way you believe.  I think your pointing out that rounding to 15 significant figures is another workaround to the problem allows me to conceptually identify what is going wrong in Calc. 

Irrespective of precision limits the contents of a given cell should always compare as being equal to the contents of that same cell.

The formula

=IF(A1=A1,1,0)

does indeed always give the correct result, i.e. 1.  Likely conclusion: Calc directly compares the contents of A1 to the contents of A1.

Furthermore when

A1 is =RAND()
B1 is =A1
C1 is =IF(A1=B1,1,0)

C1 always has the correct result, i.e. 1.  Likely conclusion: Not only does Calc directly compare the contents of A1 to B1 but also the assignment of a value to B1 from A1 is either being done directly or using intermediate variables of precision at least as great as those used to store the cell values.

The fact that COUNTIF(A1:A1,"="&A1) does not always give the correct answer indicates that the contents of the cell A1 are not being compared directly with the contents of each of the cells in the range A1 to A1 (as noted above, comparing A1 directly to A1 always works).  My guess would be that the contents of the cell A1 are copied to an intermediate variable and that that is what is then compared to each of the cells in the range A1 to A1.

The fact that COUNTIF(A1:A1,"="&A1) does give the correct answer when the contents of all the cells have been rounded to 15 significant figures gives a clue as to what is going wrong.  It suggests that Calc stores cell values with a precision greater than 15 sf but is using an intermediate variable with precision of only 15 sf when evaluating the COUNTIF comparisons.  That's a bug.  I've been a programmer long enough to know that if one wants to compare two values that are known to high precision, one doesn't first assign one of those values to a variable of low precision.

In summary: when evaluating COUNTIF Calc should be using an intermediate variable of precision at least equal to the precision with which it stores cell values.