When fitting a series of (x,y) data with a function a * x + b, everything (slope, intercept, correlation) seems ok (check done by comparison with excel and by evaluaton using the standatd formulas in any book). Changing axis so that the first (x,y) is in the origin and fitting with a * x' the slope is ok, but the correlation is wrong, It is neither the Pearson correlation nor the explained variance vs the total variance. The result is in disagreement with what found in excel or by evlautaion with standard formula I do not understand if there is a bug in the function or if the meaning of r2 is nn clearly eplaine in the on line manual
Which LO version do you use? Could you attach an example file by using this link? https://bugs.freedesktop.org/attachment.cgi?bugid=87790&action=enter (have in mind that any attachment is automatically made public, so remove any confidential/private part).
Created attachment 111428 [details] spreadsheet with data (sheet base) and two fits (1 and 2 parameters) Using LIBREOFFICE 4.3.5.2 build 3a87456aaa6a95c63eea1c1b3201acedf0751bd5 with Kubuntu 14.04 Original data in sheet 'base'; I need to fit a power law with y data in col C, x data in col A. Fit done using linearization log(y) log(x) Case 1 two parameters log(y) = a * log(x) + b, sheet 2 param The data are in range d6/e55 (logy -> col d, logx col. E) Results from function regr.lin are in range D68/e72) Result agree with excel, with what obtained using functions slope(), intercept(), correlation() etc and with what obtained using standard textbook Case 2 one parameter log(y') = A * log(x) y' is obtaine by simple shift ( y' = y / y0). Data ad resultf from regr lin same locations. The problem in in the R2 value D70 which is 0,9875 while I expect 0,7364 (as fraction of explained y variance with respect to total variance cell C70) or 0,8764 standar pearson correlation) Excel returns 0.7364
On pc Debian x86-64 with LO Debian package 4.3.3.2 and master sources updated today (so future 4.5.0), D70 has 0,8764 (so what you expected according to your last comment) I suppose I missed something, so could you attach a simple file with a minimum step by step process to reproduce the problem?
Created attachment 111516 [details] short sample of spradsheet I am resending the spradsheet as a short version: Data are in col A and B; same data (but for a constant shift in y) are in col F and G. Input of these data is TEP 1 Below each series (A & B ; F & G) the linear best fit using A = a B + b (col A & B) or F = a' G ( col F & G); choice on how to fit from the third parameter of regr.lin regr.lin(A;B; 1 ; 1) regr.lin(F;G; 0 ; 1) Evaluation of regr-lin is STEP 2 As a check for col A & B also evaluation using slope(A;B) intercept(A;B) correlation(A;B) RQ(A:B) have been made These checks are STEP 3 I have shown corresponding results using background color Now, passing to col F and G the result from regr.lin(...) are ok for the slope (not shown here the check); correlation(F;G) and RQ(F:G) correspond with what evaluated in A & B (as expected, these values are invariant) What I do not understand is the term on cell F56 (cell background in red)which - as far as I understand should be a correlation - but is neither correlation() nor RQ() and - as I to before is not equal to the value suggesteb by excel. To be sure that what I find is clear I have also copied the results with 'paste special - only numbers - in the area with gray background.
Thank you for your feedback and sorry for this late response. Just to know, what number do you expect instead of "0,987462335840797"?
What I expect is 0,7364; In these days ha have done some other checks; using the spreadsheet GNUMERIC for instance I find for r2 the value 0,98.... (same as libreoffice); same value using the Google sheet. Using the spreadsheet SCIDAVIS the R2 is 0,73..., Using the program GRETL the results are 0.98... if I use the ordinary least square methd (fit function a *x ) and I find 0,73.... if I use the non linear fit (fit funztion a * x). Looking for other occurrences of the R2 problem (google may help) I found that several years ago ( 2006 ) the same problem (wrong values of r2) had been found for excel and the origin was in the linest() function. see for instance http://support.microsoft.com/kb/829249 I have not worked any more on the subject since (a) i had a work around and (b) it's the first time I fit with Y = a *x and probably next time will be in 10 years (;-) ) Here below please find the two results obtained from GRETL and the one from SCIDAVIS Modello 1: OLS, usando le osservazioni 1-50 Variabile dipendente: v2 coefficiente errore std. rapporto t p-value ------------------------------------------------------------- v1 −1,21416 0,0195447 −62,12 2,91e-48 *** Media var. dipendente −3,693268 SQM var. dipendente 0,833732 Somma quadr. residui 8,977973 E.S. della regressione 0,428047 R-quadro 0,987462 R-quadro corretto 0,987462 F(1, 49) 3859,178 P-value(F) 2,91e-48 Log-verosimiglianza −28,01571 Criterio di Akaike 58,03141 Criterio di Schwarz 59,94343 Hannan-Quinn 58,75952 Note: SQM = scarto quadratico medio; E.S. = errore standard ---------------------------------- Sono state usate derivate analitiche Tolleranza = 1,81899e-12 Convergenza raggiunta dopo 4 iterazioni Modello 2: NLS, usando le osservazioni 1-50 v2 = alpha * v1 stima errore std. rapporto t p-value --------------------------------------------------------- alpha −1,21416 0,0195447 −62,12 2,91e-48 *** Media var. dipendente −3,693268 SQM var. dipendente 0,833732 Somma quadr. residui 8,977973 E.S. della regressione 0,428047 R-quadro 0,736410 R-quadro corretto 0,736410 Log-verosimiglianza −28,01571 Criterio di Akaike 58,03141 Criterio di Schwarz 59,94343 Hannan-Quinn 58,75952 Note: SQM = scarto quadratico medio; E.S. = errore standard ---------------------------------------------------- [02/01/15 14:31 Plot: ''Graph1''] Non-linear fit of dataset: Table1_2, using function: a*x Y standard errors: Unknown Scaled Levenberg-Marquardt algorithm with tolerance = 0,0001 From x = 0 to x = 3,912 a = -1,21415873230445 +/- 0,0195446571501494 -------------------------------------------------------------------------------------- Chi^2/doF = 0,183223940915315 R^2 = 0,736410071654573 --------------------------------------------------------------------------------------- Iterations = 1 Status = success ---------------------------------------------------------------------------------------
Thank you again for your detailed feedback. I put it at NEW. Winfried/Eike/Markus/Kohei: one for you? (http://opengrok.libreoffice.org/xref/core/sc/source/core/tool/interpr5.cxx#2340)
I put an url which may help (for those who are "fluent" at statistics)
(In reply to Julien Nabet from comment #7) > Winfried/Eike/Markus/Kohei: one for you? I will have a look at it (mathematically at first). Don't know yet if I will be able to find the cause and fix it... Luckily, mencaraglia gives us 10 years to fix this ;-)
I don't think this is a bug. See https://bugs.documentfoundation.org/show_bug.cgi?id=73374
** Please read this message in its entirety before responding ** To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present on a currently supported version of LibreOffice (5.0.5 or 5.1.2 https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the version of LibreOffice and your operating system, and any changes you see in the bug behavior If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a short comment that includes your version of LibreOffice and Operating System Please DO NOT - Update the version field - Reply via email (please reply directly on the bug tracker) - Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to "inherited from OOo"; 4b. If the bug was not present in 3.3 - add "regression" to keyword Feel free to come ask questions or to say hello in our QA chat: http://webchat.freenode.net/?channels=libreoffice-qa Thank you for your help! -- The LibreOffice QA Team This NEW Message was generated on: 2016-04-16
As GerardF said, it is not a bug. R² must not be calculated in the same in case intercept is forced. See bug 73374 and document attachment 91622 [details] for details. Excel may have bug, and may also correct its error. With Excel 2010 and your data, I get R²=0,987462189 and R²=0,9874623358 with LibO 5.1.3.1.
Caolán McNamara committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/c3c9efc94f737e5f6c42db02b261d8fdb47f3d51 crashtesting: assert on fdo87790-1.ods It will be available in 25.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.