Created attachment 117811 [details] Sheet with the bug ScInterpreter::ScFTest() method (FTEST function in the Calc) returns wrong value for matrices: [[9, 6],[8, empty]] and [[5],[7]] (1.09545 instead of 0.90455; see attached document). During fix use a few other tools as reference, please - R language, Gnumeric and MS Excel seem to calculate it well (at least they show the same result for these data).
Migrating Whiteboard tags to Keywords: (DifficultyInteresting, SkillCpp, TopicDebug -> TopicDebugging) [NinjaEdit]
Looks like the ScFTest() method tries to compute F distribution on the variance ratio using GetFDist(), where it should be computing F-cumulative distribution on the variance ratio . F-Cumulative distribution involves calculation of "regularized incomplete beta function" (See https://en.wikipedia.org/wiki/Beta_function#Incomplete_beta_function). Boost library seems to have a good implementation : http://www.boost.org/doc/libs/1_35_0/libs/math/doc/sf_and_dist/html/math_toolkit/special/sf_beta/ibeta_function.html This page mentions the research paper the implementation is based on it and looks very involved. Do we want to write the implementation on our own or can we just use boost implementation of the function in question ?
Hi All, Please ignore my previous comment. GetFDist() seems to do the correct thing. Will send in a patch to fix the issue shortly.
Created attachment 125991 [details] Compare LO algorithm with Gnumeric algorithm Calc toggles the variances, so that always the test statistic F is larger as 1. It assumes, that then the right tail probability will be smaller than 0.5. That is true in cases where the amount of data in the two series are large and nearly equal. But you can construct cases, where this assumption is wrong. The example file contains the correct algorithm in steps, so that it can be easily transfered to the code. The case distinction to get F>1 has to be removed in the code.
(In reply to Regina Henschel from comment #4) > The case distinction to get F>1 has to be removed in the code. Seems I was wrong here. The swapping makes it possible to get results near zero in case one variance is huge.
Created attachment 126437 [details] Compare LO algorithm with Gnumeric algorithm (v2) Added example with one huge variance and therefore FTEST result near zero.
Dennis Francis committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=5c401d8a93cdf7dfa450604856680a2154366fcf tdf#93329 : Fixes FTest algorithmic bug It will be available in 5.3.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Hello, Is this bug fixed? If so, could you please close it as RESOLVED FIXED?
It is fixed in Version: 5.3.7.2 (x64) Build ID: 6b8ed514a9f8b44d37a1b96673cbbdd077e24059 CPU Threads: 8; OS Version: Windows 6.19; UI Render: GL; Layout Engine: new; Locale: de-DE (de_DE); Calc: CL