Bug 93329 - FTest algorithm bug - wrong result for specific matrices
Summary: FTest algorithm bug - wrong result for specific matrices
Status: CLOSED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:5.3.0
Keywords: difficultyInteresting, skillCpp, topicDebug
Depends on:
Blocks:
 
Reported: 2015-08-10 15:32 UTC by Łukasz Hryniuk
Modified: 2017-12-11 16:28 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Sheet with the bug (7.81 KB, application/vnd.oasis.opendocument.spreadsheet)
2015-08-10 15:32 UTC, Łukasz Hryniuk
Details
Compare LO algorithm with Gnumeric algorithm (13.75 KB, application/x-vnd.oasis.opendocument.spreadsheet)
2016-06-29 20:11 UTC, Regina Henschel
Details
Compare LO algorithm with Gnumeric algorithm (v2) (20.18 KB, application/vnd.oasis.opendocument.spreadsheet)
2016-07-27 15:44 UTC, Regina Henschel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Łukasz Hryniuk 2015-08-10 15:32:49 UTC
Created attachment 117811 [details]
Sheet with the bug

ScInterpreter::ScFTest() method (FTEST function in the Calc) returns wrong value for matrices:
[[9, 6],[8, empty]] and [[5],[7]] (1.09545 instead of 0.90455; see attached document).

During fix use a few other tools as reference, please - R language, Gnumeric and MS Excel seem to calculate it well (at least they show the same result for these data).
Comment 1 Robinson Tryon (qubit) 2015-12-14 07:17:14 UTC Comment hidden (obsolete)
Comment 2 Dennis Francis 2016-06-19 07:53:35 UTC
Looks like the ScFTest() method tries to compute F distribution on the variance ratio using GetFDist(), where it should be computing F-cumulative distribution on the variance ratio . F-Cumulative distribution involves calculation of "regularized incomplete beta function" (See https://en.wikipedia.org/wiki/Beta_function#Incomplete_beta_function). 

Boost library seems to have a good implementation : http://www.boost.org/doc/libs/1_35_0/libs/math/doc/sf_and_dist/html/math_toolkit/special/sf_beta/ibeta_function.html

This page mentions the research paper the implementation is based on it and looks very involved.

Do we want to write the implementation on our own or can we just use boost implementation of the function in question ?
Comment 3 Dennis Francis 2016-06-19 12:23:28 UTC
Hi All,

Please ignore my previous comment. GetFDist() seems to do the correct thing. Will send in a patch to fix the issue shortly.
Comment 4 Regina Henschel 2016-06-29 20:11:24 UTC
Created attachment 125991 [details]
Compare LO algorithm with Gnumeric algorithm

Calc toggles the variances, so that always the test statistic F is larger as 1. It assumes, that then the right tail probability will be smaller than 0.5. That is true in cases where the amount of data in the two series are large and nearly equal. But you can construct cases, where this assumption is wrong.

The example file contains the correct algorithm in steps, so that it can be easily transfered to the code. The case distinction to get F>1 has to be removed in the code.
Comment 5 Regina Henschel 2016-06-29 21:16:20 UTC
(In reply to Regina Henschel from comment #4)
> The case distinction to get F>1 has to be removed in the code.

Seems I was wrong here. The swapping makes it possible to get results near zero in case one variance is huge.
Comment 6 Regina Henschel 2016-07-27 15:44:02 UTC
Created attachment 126437 [details]
Compare LO algorithm with Gnumeric algorithm (v2)

Added example with one huge variance and therefore FTEST result near zero.
Comment 7 Commit Notification 2016-08-29 07:02:43 UTC
Dennis Francis committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=5c401d8a93cdf7dfa450604856680a2154366fcf

tdf#93329 : Fixes FTest algorithmic bug

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 8 Xisco Faulí 2016-10-03 10:02:31 UTC
Hello,
Is this bug fixed?
If so, could you please close it as RESOLVED FIXED?
Comment 9 Regina Henschel 2017-12-11 16:28:23 UTC
It is fixed in Version: 5.3.7.2 (x64)
Build ID: 6b8ed514a9f8b44d37a1b96673cbbdd077e24059
CPU Threads: 8; OS Version: Windows 6.19; UI Render: GL; Layout Engine: new; 
Locale: de-DE (de_DE); Calc: CL