8

I have just run a simple task of trying to plot the probability density histogram for a simulation I ran. However, when I plot it, the probability for each bin seems not to match the result of the frequency plot. with 50 bins i would expect each bin to have an average probability of 2% which is not reflected in the chart.

Thanks in advance

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

plntAcres = 88.0
hvstPer = 0.99
hvstAcres = plntAcres*hvstPer
yldAcre = np.random.triangular(47,48,49, 10000)

carryIn = 464
pdn = hvstAcres * yldAcre
imp = 25.0
ttlSup = carryIn + pdn + imp

crush = np.random.uniform(1945, 1990,10000)
expts = np.random.uniform(2085, 2200,10000)
seedRes = 130
ttlDem = crush + expts + seedRes

carryOut = ttlSup - ttlDem

print carryOut

plt.hist(carryOut, bins=50,normed=True)
plt.title("Carry Out Distribution")
plt.xlabel("Value")
plt.ylabel("Probability")
plt.show()

Probability density of Carry out

Moj
  • 85
  • 1
  • 7

2 Answers2

12

In the hist function, the normed argument does not result in probabilites, but in probability densities. If you want the probabilities themselves, use the weights argument instead (and supply with 1 / len(carryOut)).

The crucial two lines:

weights = np.ones_like(carryOut) / (len(carryOut))
plt.hist(carryOut, bins=50, weights=weights)
honza_p
  • 2,073
  • 1
  • 23
  • 37
0

Your schema is a Bell Curve, usually means that your random variable is normally distributed. Check wikipedia for Normal Distribution / Gauss distribution

Mayeul sgc
  • 1,964
  • 3
  • 20
  • 35
  • And for a reason. He is adding (subtracting) three random variables with comparable standard deviation. Unless they are correlated (not in this case), the result tends to be close to Gaussian (law of big numbers). – honza_p Feb 27 '17 at 11:04
  • My main issue was that the inidvidual bin probabilities don't seem to be adding up to 1 and im not sure why @honza_p – Moj Feb 27 '17 at 11:34
  • Now I understand the question. – honza_p Feb 27 '17 at 12:36