1

I'm using the normal distribution from numpy and having a hard time understanding its documentation. Let's say I have a normal distribution with mean of 5 and standard deviation of 0.5:

import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import norm

mean = 5
std = 0.25

x = np.linspace(mean - 3*std, mean + 3*std, 1000)
y = norm(loc=mean, scale=std).pdf(x)
plt.plot(x,y)

PDF with standard deviation = 0.25

The resulting chart is the familiar bell curve but with its peak at around 1.6. How can the probability of any value exceed 1? If I multiply it by scale then the probabilities are correct.

No such problem when std (and scale) are greater than 1 however:

mean = 5
std = 10

x = np.linspace(mean - 3*std, mean + 3*std, 1000)
y = norm(loc=mean, scale=std).pdf(x)
plt.plot(x,y)

PDF with standard deviation = 10

The documentation on norm says loc is the mean and scale is the standard deviation. Why does it behave so strangely with scale greater and less than 1?

Python 3.8.2. Scipy 1.4.1

Mike Henderson
  • 2,028
  • 1
  • 13
  • 32

1 Answers1

2

The "bell curve" you are plotting is a probability density function (PDF). This means that the probability for a random variable with that distribution falling in any interval [a, b] is the area under the curve between a and b. Thus the whole area under the curve (from -infinity to +infinity) must be 1. So when the standard deviation is small, the maximum of the PDF may well be greater than 1, there is nothing strange about that.


Follow-up question: Is the area under the curve in the first plot really 1?

Yes, it is. One way to confirm this is to approximate the area under the curve by calculating the total area of a series of rectangles whose heights are defined by the curve:

import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import norm
import matplotlib.patches as patches

mean = 5
std = 0.25

x = np.linspace(4, 6, 1000)
y = norm(loc=mean, scale=std).pdf(x)

fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_aspect('equal')
ax.set_xlim([4, 6])
ax.set_ylim([0, 1.7])

# Approximate area under the curve by summing over rectangles:

xlim_approx = [4, 6]  # locations of left- and rightmost rectangle
n_approx = 17  # number of rectangles

# width of one rectangle:
width_approx = (xlim_approx[1] - xlim_approx[0]) / n_approx  
# x-locations of rectangles:
x_approx = np.linspace(xlim_approx[0], xlim_approx[1], n_approx)
# heights of rectangles:
y_approx = norm(loc=mean, scale=std).pdf(x_approx)

# plot approximation rectangles:
for i, xi in enumerate(x_approx):
    ax.add_patch(patches.Rectangle((xi - width_approx/2, 0), width_approx, 
                                   y_approx[i], facecolor='gray', alpha=.3))

# areas of the rectangles:
areas = y_approx * width_approx

# total area of the rectangles:
print(sum(areas))

0.9411599204607589

pdf plot with rectangles

Okay, that's not quite 1, but let's get a better approximation by extending the x-limits and inreasing the number of rectangles:

xlim_approx = [0, 10]
n_approx = 100_000

width_approx = (xlim_approx[1] - xlim_approx[0]) / n_approx
x_approx = np.linspace(xlim_approx[0], xlim_approx[1], n_approx)
y_approx = norm(loc=mean, scale=std).pdf(x_approx)

areas = y_approx * width_approx
print(sum(areas))

0.9999899999999875

Arne
  • 9,990
  • 2
  • 18
  • 28
  • In the first plot, there's no chance that the entire area under the curve is 1. The PDF represents the probability that an event with that exact statistics occur and all probabilities have to be between [0, 1]. It peaked at 1.6 because scipy divided it by `std` (0.25). The question is why scipy does it when `std = 0.25` but not when `std = 10`. – Mike Henderson Apr 08 '20 at 11:42
  • *"The PDF represents the probability that an event with that exact statistics occur..."* No, that is not what the PDF represents. It is a probability *density*, not a probability. The probably that a sample from the normal distribution is exactly, say, 0.3, is 0. This is true for all continuous distributions: the probability of any given value is 0. To convince yourself, see the actual formula, given as f(x) in the [wikipedia page](https://en.wikipedia.org/wiki/Normal_distribution). Note that f(μ) = 1/(σ*sqrt(2π)). Whenever σ < 1/sqrt(2π), f(μ) > 1. But the probability that x=μ is 0. – Warren Weckesser Apr 08 '20 at 12:04
  • I stand by my answer. @Warren Weckesser gave a good explanation. Alternatively, you can see that the area under the curve is 1 by approximating it. I extended my answer to show this. – Arne Apr 08 '20 at 17:14