93329 – FTest algorithm bug - wrong result for specific matrices

Bug 93329 - FTest algorithm bug - wrong result for specific matrices

Summary: FTest algorithm bug - wrong result for specific matrices

Status:	CLOSED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Calc (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:	target:5.3.0
Keywords:	difficultyInteresting, skillCpp, topicDebug

Depends on:
Blocks:

Reported:	2015-08-10 15:32 UTC by Łukasz Hryniuk
Modified:	2017-12-11 16:28 UTC (History)
CC List:	5 users (show)

See Also:
Crash report or crash signature:

Attachments
Sheet with the bug (7.81 KB, application/vnd.oasis.opendocument.spreadsheet) 2015-08-10 15:32 UTC, Łukasz Hryniuk	Details
Compare LO algorithm with Gnumeric algorithm (13.75 KB, application/x-vnd.oasis.opendocument.spreadsheet) 2016-06-29 20:11 UTC, Regina Henschel	Details
Compare LO algorithm with Gnumeric algorithm (v2) (20.18 KB, application/vnd.oasis.opendocument.spreadsheet) 2016-07-27 15:44 UTC, Regina Henschel	Details
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Łukasz Hryniuk 2015-08-10 15:32:49 UTC

Created attachment 117811 [details]
Sheet with the bug

ScInterpreter::ScFTest() method (FTEST function in the Calc) returns wrong value for matrices:
[[9, 6],[8, empty]] and [[5],[7]] (1.09545 instead of 0.90455; see attached document).

During fix use a few other tools as reference, please - R language, Gnumeric and MS Excel seem to calculate it well (at least they show the same result for these data).

Comment 1 Robinson Tryon (qubit) 2015-12-14 07:17:14 UTC Comment hidden (obsolete)

Migrating Whiteboard tags to Keywords: (DifficultyInteresting, SkillCpp, TopicDebug -> TopicDebugging)
[NinjaEdit]

Comment 2 Dennis Francis 2016-06-19 07:53:35 UTC

Looks like the ScFTest() method tries to compute F distribution on the variance ratio using GetFDist(), where it should be computing F-cumulative distribution on the variance ratio . F-Cumulative distribution involves calculation of "regularized incomplete beta function" (See https://en.wikipedia.org/wiki/Beta_function#Incomplete_beta_function). 

Boost library seems to have a good implementation : http://www.boost.org/doc/libs/1_35_0/libs/math/doc/sf_and_dist/html/math_toolkit/special/sf_beta/ibeta_function.html

This page mentions the research paper the implementation is based on it and looks very involved.

Do we want to write the implementation on our own or can we just use boost implementation of the function in question ?

Comment 3 Dennis Francis 2016-06-19 12:23:28 UTC

Hi All,

Please ignore my previous comment. GetFDist() seems to do the correct thing. Will send in a patch to fix the issue shortly.

Comment 4 Regina Henschel 2016-06-29 20:11:24 UTC

Created attachment 125991 [details]
Compare LO algorithm with Gnumeric algorithm

Calc toggles the variances, so that always the test statistic F is larger as 1. It assumes, that then the right tail probability will be smaller than 0.5. That is true in cases where the amount of data in the two series are large and nearly equal. But you can construct cases, where this assumption is wrong.

The example file contains the correct algorithm in steps, so that it can be easily transfered to the code. The case distinction to get F>1 has to be removed in the code.

Comment 5 Regina Henschel 2016-06-29 21:16:20 UTC

(In reply to Regina Henschel from comment #4)
> The case distinction to get F>1 has to be removed in the code.

Seems I was wrong here. The swapping makes it possible to get results near zero in case one variance is huge.

Comment 6 Regina Henschel 2016-07-27 15:44:02 UTC

Created attachment 126437 [details]
Compare LO algorithm with Gnumeric algorithm (v2)

Added example with one huge variance and therefore FTEST result near zero.

Comment 7 Commit Notification 2016-08-29 07:02:43 UTC

Dennis Francis committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=5c401d8a93cdf7dfa450604856680a2154366fcf

tdf#93329 : Fixes FTest algorithmic bug

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 8 Xisco Faulí 2016-10-03 10:02:31 UTC

Hello,
Is this bug fixed?
If so, could you please close it as RESOLVED FIXED?

Comment 9 Regina Henschel 2017-12-11 16:28:23 UTC

It is fixed in Version: 5.3.7.2 (x64)
Build ID: 6b8ed514a9f8b44d37a1b96673cbbdd077e24059
CPU Threads: 8; OS Version: Windows 6.19; UI Render: GL; Layout Engine: new; 
Locale: de-DE (de_DE); Calc: CL