Bug 75538 - Other: R^2 of trend line is calculated wrong when forced through a defined y-value.
Summary: Other: R^2 of trend line is calculated wrong when forced through a defined y-...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
4.2.1.1 release
Hardware: Other Linux (All)
: medium normal
Assignee: Laurent Balland
URL:
Whiteboard: BSA target:4.3.0 target:4.2.3
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-26 17:56 UTC by e325001
Modified: 2014-03-07 14:41 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
example-picture of wrong calculated R^2 with trend lines of an x/y-plot. (22.51 KB, image/png)
2014-02-26 17:56 UTC, e325001
Details
the .ods file showing the problem (25.79 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-02-27 15:57 UTC, e325001
Details
Test case with R^2 > 1 (69.81 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-02-27 16:08 UTC, Laurent Balland
Details

Note You need to log in before you can comment on or make changes to this bug.
Description e325001 2014-02-26 17:56:35 UTC
Created attachment 94778 [details]
example-picture of wrong calculated R^2 with trend lines of an x/y-plot.

Problem description: 

Steps to reproduce:
1. take numbers (x, y) and plot them.
2. plot the trend line ans let libreoffice show the correlation coefficient (correct)
3. let the trend line go through zero or another value and get strange R^2's.

Current behavior:

R^2 can be bigger than 1, if the correlation gets worse.

Expected behavior:

R^2 is defined as a value between 0 and 1. If the correlation gets worse, the vlaue should become smaller.
              
Operating System: Fedora
Version: 4.2.1.1 release
Comment 1 Joel Madero 2014-02-27 03:12:09 UTC
Please provide an ODS that demonstrates the problem. It's always best to provide your data and files so that we can see exactly what you see instead of "take numbers and plot them" -- marking as NEEDINFO. Once you attach the document mark as UNCONFIRMED.


Thanks!
Comment 2 Jacques Guilleron 2014-02-27 10:50:30 UTC
Hello e325001,

See also Bug 73374 - EDITING: R^2 calculation of trendline with forced intercept is not correct

Thank you,

Jacques
Comment 3 e325001 2014-02-27 15:57:15 UTC
Created attachment 94829 [details]
the .ods file showing the problem

Thanks for taking a look into this. Here is an .ods example. And just for your information (because of the other bug you mentioned), I use the following version: Build-ID: 4.2.1.1-1.fc20

Thanks again.
Comment 4 Joel Madero 2014-02-27 16:03:07 UTC
I think you assigned this to yourself on accident - unless you are planning on committing a patch to fix it :)
Comment 5 Laurent Balland 2014-02-27 16:08:54 UTC
Created attachment 94831 [details]
Test case with R^2 > 1

This test case shows the results that should be calculated given by LINEREG() function. Modify F1 value to modify forced intercept of LINEREG() function. R^2 is calculated in E11.
Comment 6 Laurent Balland 2014-02-27 17:21:45 UTC
Please read my previous comment attached to file as follow:
This test case shows the results that should be calculated given by *LINEST*() function. Modify F1 value to modify forced intercept of *LINEST*() function. R^2 is calculated in *E12*.
Comment 7 e325001 2014-02-28 08:07:09 UTC
Hello Laurent BP, 
in your example file the R^2 without forcing an intercept can be calculated correctly (0,405), but if I force the trend line through zero I get R^2= 0,849, and this is a "better" value (both with linest and within the chart), which cannot be the case, correct? And, as I still think, every value smaller 0 or bigger 1 must be false. I have to admit I am no expert, but I googled a bit and found similar problems with exel (just google "r^2 forced zero"). Do you need any more help or confirmation regarding this problem? As written above, I am no expert and do not even understand the Ydec and Y^ in your file. 
Thanks so far again.
Comment 8 Laurent Balland 2014-02-28 23:55:40 UTC
@e325001: thanks for your proposition.
(In reply to comment #7)
> Hello Laurent BP, 
> in your example file the R^2 without forcing an intercept can be calculated
> correctly (0,405), but if I force the trend line through zero I get R^2=
> 0,849, and this is a "better" value (both with linest and within the chart),
> which cannot be the case, correct?
R^2 value cannot be compared with or without forced intercept. Check attachment 91622 [details] for more details. In first try, I used the formula proposed. 
> And, as I still think, every value
> smaller 0 or bigger 1 must be false.
This formula gives wrong results as you mentioned (>1 or <0). So I proposed a new commit
https://gerrit.libreoffice.org/#/c/8402/ 
which use the same formula as LINEST calculate.
> I have to admit I am no expert, but I
> googled a bit and found similar problems with exel (just google "r^2 forced
> zero").
Excel use same formula for R^2 with or without forced intercept and then give false results with forced intercept. Please check Microsoft help about this:
http://support.microsoft.com/kb/829249
> Do you need any more help or confirmation regarding this problem? As
> written above, I am no expert and do not even understand the Ydec and Y^ in
> your file. 
Ydec is modified value of Y (Ydec=Y - forced_intercept), because LINEST function can only forced intercept through 0. Y^ is estimated value of Y with equation.
Comment 9 Commit Notification 2014-03-03 17:15:31 UTC
Laurent Balland-Poirier committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=00cb825ab3f16a94f1e8311ba0c24f72588e788e

fdo#75538 R^2 calculation for trendline similar to LINEST function



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 10 Commit Notification 2014-03-07 12:47:51 UTC
Laurent Balland-Poirier committed a patch related to this issue.
It has been pushed to "libreoffice-4-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=cade56eef6d23aa7dd6e374858cf30ff3fcd163b&h=libreoffice-4-2

fdo#75538 R^2 calculation of trendline similar to LINEST function


It will be available in LibreOffice 4.2.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 11 e325001 2014-03-07 13:22:34 UTC
Hello, 
I tested the patch (daily snapshot) and now R^2 is always between 0 and 1, thats good. But the values are strange anyway (?): How is it possible that the R^2 gets better (nearer to 1) if the trend line is forced through a defined value than without forcing it through a value? I thought if it is not forced it takes automatically the "best" possibiliy.
My provided .ods file is still valid. If you insert the trendline R^2 is 0,999750 and if you force it through zero R^2 is 0,999896, which is a better value and this is what I cannot understand.

Also if you force it through 300 R^2 is 0,014, but if you force it through 600 it is 0,33, which is again a better value, but actually the trendline fits way worse.

Maybe I make a mistake, I am no expert, but I think there is still something wrong.
Thanks so far :)


(In reply to comment #10)
> Laurent Balland-Poirier committed a patch related to this issue.
> It has been pushed to "libreoffice-4-2":
> 
> http://cgit.freedesktop.org/libreoffice/core/commit/
> ?id=cade56eef6d23aa7dd6e374858cf30ff3fcd163b&h=libreoffice-4-2
> 
> fdo#75538 R^2 calculation of trendline similar to LINEST function
> 
> 
> It will be available in LibreOffice 4.2.3.
> 
> The patch should be included in the daily builds available at
> http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
> information about daily builds can be found at:
> http://wiki.documentfoundation.org/Testing_Daily_Builds
> Affected users are encouraged to test the fix and report feedback.
Comment 12 Laurent Balland 2014-03-07 14:35:33 UTC
(In reply to comment #11)
> Hello, 
> I tested the patch (daily snapshot) and now R^2 is always between 0 and 1,
> thats good. But the values are strange anyway (?): How is it possible that
> the R^2 gets better (nearer to 1) if the trend line is forced through a
> defined value than without forcing it through a value? I thought if it is
> not forced it takes automatically the "best" possibiliy.
Please check above my comment 8 and particularly attachment 91622 [details] which explain why you cannot compare R^2 with and without forced intercept.
 
> Also if you force it through 300 R^2 is 0,014, but if you force it through
> 600 it is 0,33, which is again a better value, but actually the trendline
> fits way worse.
I agree with you that forced intercept higher than 324 will result in better R^2. I don't know why, but I don't know better way to calculate R^2. If you have an equation, please let us know. :)
 
> Maybe I make a mistake, I am no expert, but I think there is still something
> wrong.
> Thanks so far :)
As your example is not a common way to use forced intercept, I am closing the bug. If you have another (better) way to calculate R^2, feel free to reopen this bug.