Bug 142643 - LibreOffice Calc power regression coefficients error
Summary: LibreOffice Calc power regression coefficients error
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Documentation (show other bugs)
Version:
(earliest affected)
7.0.6.2 release
Hardware: All Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Data-Statistics
  Show dependency treegraph
 
Reported: 2021-06-04 05:05 UTC by edwardsontan
Modified: 2021-06-05 10:28 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Power Regresssion Methods - Comparision of Coefficients (65.37 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-06-05 10:28 UTC, [REDACTED]
Details

Note You need to log in before you can comment on or make changes to this bug.
Description edwardsontan 2021-06-04 05:05:42 UTC
Description:
After selecting x and y data for power regression (y = ax^b) the results for a and b are incorrect. Cells for "Intercept" and "LN(X1)" are erroneous. 

Steps to Reproduce:
X        Y
------------
1	2.7
2	4
3	4.4
4	7.1
5	4.9
6	3.6
7	4
8	0.6
9	1
10	4.3


Actual Results:
Confidence level	0.99
	
	Coefficients
Intercept	1.57125263268399
LN(X1)	-0.310308529671727


Expected Results:
a = 4.812672931
b = -0.3103085297


Reproducible: Always


User Profile Reset: No



Additional Info:
a = 4.812672931
b = -0.3103085297
Comment 1 Ming Hua 2021-06-04 07:01:38 UTC
I can reproduce the reported result "Intercept = 1.57... LN(X1) = -0.310..." with both 7.0.6 and 7.2/master:
Version: 7.2.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 18e5e948dd66e41f17b0a63bf631d98aee84a03b
CPU threads: 2; OS: Windows 10.0 Build 19041; UI render: Skia/Raster; VCL: win
Locale: zh-CN (zh_CN); UI: zh-CN
Calc: threaded

However I know too little about regression analysis to judge whether this is wrong result.
Comment 2 m_a_riosv 2021-06-04 08:18:49 UTC
Please can you attach the example file?
Comment 3 edwardsontan 2021-06-04 08:54:52 UTC
Top menu bar: Data > Statistics > Regression. Clicking on the Help button brings up the example tabular x and y data that I used. The help page also has the correct coefficients for linear, logarithmic and power regression. 
Also I think the cell with the label "LN(X1)" is mislabeled. In the same help page it states that: "Power regression: Find a power curve in the form of y = a.x^b, where a is the coefficient, b is the power that best fits the data." So there shouldn't be any logarithmic operation performed on X. And further, after a power regression has been performed, the result for "LN(X1)" is in fact b or the intercept. The result for "Intercept" on other hand is a mystery. I don't know how it fits into the equation y = a.x^b

(In reply to m.a.riosv from comment #2)
> Please can you attach the example file?
Comment 4 Ming Hua 2021-06-04 08:58:10 UTC
(In reply to m.a.riosv from comment #2)
> Please can you attach the example file?
I don't think an example file is really needed, as this regression analysis feature is based on UI, not formulas.

Steps to reproduce:
1. Copy the data given in comment #0 to a sheet.  Copy the data from Bugzilla page, in Calc choose menu Edit > Paste Special > Paste Unformatted Text, the Text Import dialog should automatically recognize the data as a ten-row, two-column range, and if not, adjust the "Separator Options" to use "Separated by" and "Tab";

2. Select the data range, choose menu Data > Statistics > Regression.  The first two data boxes, X-range and Y-range, should be preset as A1:A10 and B1:B10 (or similar, if you pasted to different place on the sheet), select a cell for the "Results to:" box, A12 for example, then choose "Power Regression" radio button;

3. Press "OK" button, the regression analysis result should be written to cells starting at A12.  Find the Intercept and LN(X1) coefficients in cell B38 and B39, respectively.
Comment 5 edwardsontan 2021-06-04 09:28:40 UTC
The following is the dataset that I was working on and which first alerted me to the problem. 

X       Y
----------
240	57
280	48
320	40
360	35
400	30
500	25
600	20
700	17
800	14
1000	11.5
1200	9.5
1500	8
2000	6.7
2500	5.5
3000	4
4000	3
6000	2
8000	1.2

I used an online power regression calculator (https://keisan.casio.com/exec/system/14059931777261) to get the correct answers. 

Libre Calc results:
a = -1.03774556216783
b = 9.66940331258679

Online calculator results:
a = 15825.903
b = -1.03774556
Comment 6 [REDACTED] 2021-06-04 11:19:31 UTC
EXP(1.57125263268399) = 4.81267293100676

And that's exactly what the trend line equation shows, when you insert a trend line in a chart of your data.
Comment 7 [REDACTED] 2021-06-04 11:22:07 UTC
(In reply to Ming Hua from comment #4)
> (In reply to m.a.riosv from comment #2)
> > Please can you attach the example file?
> I don't think an example file is really needed, as this regression analysis
> feature is based on UI, not formulas.

No - you can calculate from data and formulas, and that's why it is important to know, how reporter linearizes the problem.
Comment 8 [REDACTED] 2021-06-04 11:27:38 UTC
(In reply to edwardsontan from comment #5)
> The following is the dataset that I was working on and which first alerted
> me to the problem. 
> 
> X       Y
> ----------
> 240	57
> 280	48
> 320	40
> 360	35
> 400	30
> 500	25
> 600	20
> 700	17
> 800	14
> 1000	11.5
> 1200	9.5
> 1500	8
> 2000	6.7
> 2500	5.5
> 3000	4
> 4000	3
> 6000	2
> 8000	1.2
> 
> I used an online power regression calculator
> (https://keisan.casio.com/exec/system/14059931777261) to get the correct
> answers. 
> 
> Libre Calc results:
> a = -1.03774556216783
> b = 9.66940331258679
> 
> Online calculator results:
> a = 15825.903
> b = -1.03774556

My Result: EXP(9.66940331258679) = 15825.9030884824
Comment 9 [REDACTED] 2021-06-04 11:47:52 UTC
Just to make clear, how I calculate and understand the regression:

   f(x) = a * x^b
-> ln (f(x)) = ln (a*x^b) = ln(a) + ln(x^b) = ln(a) + b*ln(x)

Now you use function() LINEST on natural logarithm of both axis an you get two coefficients:

[1] Slope - which is b
[2] Intercept - which is ln(a)

Hence you need to exp(ln(a)) to get a.
Comment 10 [REDACTED] 2021-06-04 11:54:15 UTC
My conclusion - can't see any bug here.
Comment 11 Ming Hua 2021-06-04 12:15:45 UTC
(In reply to Uwe Auer from comment #9)
> Just to make clear, how I calculate and understand the regression:
> 
>    f(x) = a * x^b
> -> ln (f(x)) = ln (a*x^b) = ln(a) + ln(x^b) = ln(a) + b*ln(x)
> 
> Now you use function() LINEST on natural logarithm of both axis an you get
> two coefficients:
> 
> [1] Slope - which is b
> [2] Intercept - which is ln(a)
> 
> Hence you need to exp(ln(a)) to get a.
Thanks for the explanation.

But all of these are not easily deductible from either the Data > Statistics > Regression... dialog or the help page [1].  Some improvement on documentation is probably in order.

1. https://help.libreoffice.org/latest/en-US/text/scalc/01/statistics_regression.html
Comment 12 edwardsontan 2021-06-04 13:32:28 UTC
(In reply to Uwe Auer from comment #9)
> Just to make clear, how I calculate and understand the regression:
> 
>    f(x) = a * x^b
> -> ln (f(x)) = ln (a*x^b) = ln(a) + ln(x^b) = ln(a) + b*ln(x)
> 
> Now you use function() LINEST on natural logarithm of both axis an you get
> two coefficients:
> 
> [1] Slope - which is b
> [2] Intercept - which is ln(a)
> 
> Hence you need to exp(ln(a)) to get a.

Thank you Uwe for all your posts and answers! That really clarified the matter. Though I wish libreoffice didn't have to let us do more computations which it could've done for us. Maybe a note in the help page as to how to use the results from the power reg would be helpful :-D
Comment 13 edwardsontan 2021-06-04 13:40:27 UTC
(In reply to Ming Hua from comment #11)
> (In reply to Uwe Auer from comment #9)
> > Just to make clear, how I calculate and understand the regression:
> > 
> >    f(x) = a * x^b
> > -> ln (f(x)) = ln (a*x^b) = ln(a) + ln(x^b) = ln(a) + b*ln(x)
> > 
> > Now you use function() LINEST on natural logarithm of both axis an you get
> > two coefficients:
> > 
> > [1] Slope - which is b
> > [2] Intercept - which is ln(a)
> > 
> > Hence you need to exp(ln(a)) to get a.
> Thanks for the explanation.
> 
> But all of these are not easily deductible from either the Data > Statistics
> > Regression... dialog or the help page [1].  Some improvement on
> documentation is probably in order.
> 
> 1.
> https://help.libreoffice.org/latest/en-US/text/scalc/01/
> statistics_regression.html

Yes. Would be good to put more details in help page / documentation.
Comment 14 Regina Henschel 2021-06-04 18:18:43 UTC
So let us change it to documentation.
Comment 15 [REDACTED] 2021-06-05 10:28:48 UTC
Created attachment 172638 [details]
Power Regresssion Methods - Comparision of Coefficients

I'm not sure whether this is really a pure documentation issue only, but indeed a problem of the "Data -> Statistics -> Regression -> Type: Power Regression" functionality (at least from a users perspective). 

My comments are based on how I do such power regression, which essentially is based on using:

[1] =SLOPE(LN(range_of_Y_values);LN(range_of_X_values))
[2] =EXP(INTERCEPT(LN(range_of_Y_values);LN(range_of_X_values)))

according to function  LN(f(x)) = LN(a) + b * LN(x) (derived from f(x)=ax^b)

As a user I'd expect a "Power" regression analysis to yield the coefficients a and b directly (i.e. without any further exponentiation) just like a power trend line in a chart shows both coefficients correctly (without applying any further math) setting option "[x] Show Equation". So it least when comparing 

[1] Coefficients in trend line equation in chart --with--
[2] Coefficients in "Data -> Statistics -> Regression -> Type: Power Regression" (see cells J20 and J21 in the sample file) 

there is a discrepancy. See also attachment.