Bug 71109 - NORMDIST() gives a percent result instead of a probability
Summary: NORMDIST() gives a percent result instead of a probability
Status: CLOSED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: NeedsDocumentation
Keywords:
Depends on:
Blocks: Calc-Function
  Show dependency treegraph
 
Reported: 2013-10-31 23:42 UTC by djosiasmp
Modified: 2019-02-19 18:01 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
file demontration (44.20 KB, application/vnd.oasis.opendocument.spreadsheet)
2013-11-04 20:47 UTC, djosiasmp
Details

Note You need to log in before you can comment on or make changes to this bug.
Description djosiasmp 2013-10-31 23:42:06 UTC
NORMDIST() definition from

https://help.libreoffice.org/Calc/Statistical_Functions_Part_Four#NORMDIST

Clearly states "Returns the density function or the normal cumulative distribution."

However, the number that is returned is a percentage (i.e. adding all numbers gives you a total close to 100, not 1). THIS IS NOT A PROBABILITY DENSITY FUNCTION
Comment 1 Jorendc 2013-11-03 18:59:49 UTC
Hi,

I can't reproduce this behavior.

If you look at the examples given on the link you mentioned: 

 Example

=NORMDIST(70;63;5;0) returns 0.03.

=NORMDIST(70;63;5;1) returns 0.92.

When I try them in Calc, it returns the same values. Doesn't look like percentages to me (except you multiply by 100).

Make sure your cell formatting ( 'right-click the cell' > 'Format Cells...' ) isn't 'percent' or something like that.

Kind regards,
Joren
Comment 2 Jorendc 2013-11-03 19:00:20 UTC
PS: what's your used Operating system and LibreOffice version (Help > About LibreOffice)
Comment 3 djosiasmp 2013-11-04 20:47:23 UTC
Created attachment 88652 [details]
file demontration
Comment 4 djosiasmp 2013-11-04 20:48:21 UTC
Here is a file demonstrating the effect in LO-4.1.3.2 on a WIN7-64 bit computer.

The cumulative distribution calculates well to a maximum value of 1, but the probability density has values of ~18 at the highest point. If you add the values for the probability density you get a number 1000 higher.
Comment 5 Jorendc 2013-11-10 16:44:33 UTC
@Markus: any interest, with your math knowledge, to look at this one?

Kind regards,
Joren
Comment 6 Robinson Tryon (qubit) 2013-12-20 12:49:32 UTC
(In reply to comment #4)
> Here is a file demonstrating the effect in LO-4.1.3.2 on a WIN7-64 bit
> computer.
> 
> The cumulative distribution calculates well to a maximum value of 1, but the
> probability density has values of ~18 at the highest point. If you add the
> values for the probability density you get a number 1000 higher.

https://en.wikipedia.org/wiki/Probability_density_function
"a probability density function is a function that describes the relative likelihood for this random variable to take on a given value."

I'd therefore assume that return values for our version of this function would be in a range like [0, 100] or [0, 1].

Our Help docs don't specify the range of the function:
https://help.libreoffice.org/Calc/Statistical_Functions_Part_Four#NORMDIST

...so at the very least this is a case of missing documentation. If the output range is supposed to be [0, 1], then a return value of 18 is also a bug in the implementation.

Status: NEW
Whiteboard: NeedsDocumentation
Comment 7 QA Administrators 2015-04-19 03:23:26 UTC Comment hidden (obsolete)
Comment 8 QA Administrators 2016-09-20 09:24:49 UTC Comment hidden (obsolete)
Comment 9 Roman Kuznetsov 2019-02-19 11:03:09 UTC
Olivier, Eike, can you look at this?
Comment 10 Eike Rathke 2019-02-19 17:49:56 UTC
Our implementation here returns the ~same values as Gnumeric and probably Excel as well. The returned values are NOT a percentage (you'll get different values with different parametrization, e.g. multiply all X values and average and standard variation by 0.1 and you'll see results multiplied by 10).

Note that Excel nowadays calls this Probability Mass Function instead of Probability Density Function (the difference is function for discrete values vs continuous values hence statistically more correct in this context, as the probability of any one value out of continuous values is always zero), older and derived documentation talk of density.

See https://support.office.com/en-US/article/NORMDIST-function-126DB625-C53E-4591-9A22-C9FF422D6D58 also for the mathematical formula that is defined the same in https://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#NORMDIST and also given at https://en.wikipedia.org/wiki/Probability_density_function#Families_of_densities

You can easily check column B values with (here in R1C1 address notation)

  =EXP(-(((RC1-R1C3)^2)/(2*R2C3^2)))/(SQRT(2*PI())*R2C3)

Specifically note in that Wikipedia article's section the mention of "the normalization factor of a distribution (the multiplicative factor that ensures that the area under the density—the probability of something in the domain occurring— equals 1). This normalization factor is outside the kernel of the distribution."