Bug 106796 - function CHITEST and CHISQ.TEST - empty cells
Summary: function CHITEST and CHISQ.TEST - empty cells
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Winfried Donkers
URL:
Whiteboard: target:5.4.0
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-27 14:58 UTC by raal
Modified: 2017-06-09 11:05 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description raal 2017-03-27 14:58:42 UTC
Description:
> see row 9 -  excel returns value, Calc returns error.  
>

Winfried:
CHISQ.TEST (Excel) and CHITEST (Excel and LEGACY.CHITEST in ODFF1.2) are identical, so any changes to the code will apply to both functions.

In row 9 you use ranges L4:L10 and M4:M10, but the data is in L3:M9: Excel handles empty cells differently than Calc does.
In Excel the data ranges have 6 rows.
In Calc 7 rows and the last row contains no data (Error502, illegal argument).

I did some tests and Excel reduces the range when any of the two ranges has one or more empty cells.
So range (_ being an empty cell)
  A   E
  1   1
  _    _
  2   2
  3   _
  4   4
  _    5
  _   _
is treated in Excel as a 3-row range. In Calc this produces an error, because not all 7 rows have valid data.
Note that 0 in the range for expected values is not allowed because of the algorithm (DIV/0), so empty cells are treated as empty, not as 0.

Now the difficult part: is it a bug and if so, Excel's or Calc's or both?
1. The skipping of empty rows (in both ranges) seems consistent to me (with e.g. SUM it is common practice  to enter a range which is much bigger than the amount of values to be added, so that values added later will be incorporated in SUM automatically). IMO this applies for both empty rows at the end of the range as for empty rows in between, simply to be consistent with other functions using ranges. That would mean there is a bug in Calc to be fixed.
2. The skipping of rows when only one of the two is missing is illegal as far as I am concerned. Excel says "If actual_range and expected_range have a different number of data points, CHISQ.TEST returns the #N/A error value." and ODFF has constraints "ROWS(A) == ROWS(E)" and "COLUMNS(A) == COLUMNS(E)". So this looks like a bug in Excel and behaviour that should _not_ be copied into Calc.

As the ODFF text for LEGACY.CHITEST looks copied from Excel (with respect to the algorithms) I propose we fix Calc to handle empty rows -when occurring simultaneously in both ranges- just like Excel, i.e. as non-existent rows. 

Steps to Reproduce:
=CHISQ.TEST(L4:L10,M4:M10)   but the data are only in L4:L9 and M4:M9 


Actual Results:  
error

Expected Results:
-


Reproducible: Always

User Profile Reset: No

Additional Info:


User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Comment 1 raal 2017-03-27 15:01:18 UTC
Winfried confirmed
Comment 2 Winfried Donkers 2017-03-28 06:08:02 UTC
(In reply to raal from comment #0)
> Description:
> [...]
> 2. The skipping of rows when only one of the two is missing is illegal as
> far as I am concerned. Excel says "If actual_range and expected_range have a
> different number of data points, CHISQ.TEST returns the #N/A error value."
> and ODFF has constraints "ROWS(A) == ROWS(E)" and "COLUMNS(A) ==
> COLUMNS(E)". So this looks like a bug in Excel and behaviour that should
> _not_ be copied into Calc.
> 
> [...]

In my text in Comment 1 I missed an important phrase in ODFF1.2:
'For an empty element or an element of type Text or Boolean in A the element at the corresponding position of E is ignored, and vice versa.'. 

This does not explain what is to be done with the degrees of freedom should a complete row or column be empty. Excel keeps the degrees of freedom unchanged, i.e. solely based on the range dimensions. I will do that too for Calc.

There is one -undocumented- exception in Excel: when calling CHITEST/CHISQ.TEST with ranges that are completely empty, the result should be 1, but Excel returns #DIV/0!.

I propose that Calc returns an error (Illegal Argument) and that a change request is made for ODFF: A- nor E-ranges may be empty, there must be at least 1 pair in the data set.
Comment 3 Commit Notification 2017-05-04 23:36:30 UTC
Winfried Donkers committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a7f1df962c1c6cb0e1d51d8a67f0190c64962544

tdf#106796 Make CHITEST and CHISQ.TEST comply with ODFF1.2 and improve

It will be available in 5.4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 4 Xisco Faulí 2017-06-09 09:54:01 UTC
Hello,
Is this bug fixed?
If so, could you please close it as RESOLVED FIXED?
Comment 5 Winfried Donkers 2017-06-09 11:05:18 UTC
(In reply to Xisco Faulí from comment #4)
> Hello,
> Is this bug fixed?
> If so, could you please close it as RESOLVED FIXED?

Sorry Xisco, I forgot.