4

We have the arraya=range(10). Using numpy.histogram:

hist,bins=numpy.histogram(a,bins=(np.max(a)-np.min(a))/1, range=np.min(a),np.max(a)),density=True)

According to numpy tutorial:

If density=True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1.

The result is:

array([ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.2])

I try to do the same using scipy.stats:

mean = np.mean(a)
sigma = np.std(a)
norm.pdf(a, mean, sigma)

However the result is different:

array([ 0.04070852,  0.06610774,  0.09509936,  0.12118842,  0.13680528,0.13680528,  0.12118842,  0.09509936,  0.06610774,  0.04070852])

I want to know why.

Update:I would like to set a more general question. How can we have the probability density function of an array without using numpy.histogram for density=True ?

DimKoim
  • 1,024
  • 6
  • 20
  • 33

3 Answers3

2

If density=True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1.

The "normalized" there does not mean that it will be transformed using a Normal Distribution. It simply says that each value in the bin will be divided by the total number of entries so that the total density would be equal to 1.

jtitusj
  • 3,046
  • 3
  • 24
  • 40
  • According to you if I change the bins=(np.max(a)-np.min(a))/1 to (np.max(a)-np.min(a))/2 the sum(hist) will be one as well. But if you do it you will see that is not. – DimKoim May 19 '15 at 13:42
  • @DimKoim You are right in that matter. I've just realized that it does not work like that for numpy.histogram (density=True). I'm not sure why but most probably, you need to make sure that the width of the bins to 1. However, the basic concept of normalization there is that you want to get a value of 1 for the area under the curve if you graph the densities. See this link to have a better idea on normalization and the problem with numpy.histogram http://stackoverflow.com/questions/21532667/numpy-histogram-cumulative-density-does-not-sum-to-1 – jtitusj May 19 '15 at 14:04
  • 1
    The problem is when I don't want a bin=1. In fact the more general question is how we can compute the pdf of an array. – DimKoim May 19 '15 at 14:08
  • 1
    I see. In that case, instead of using density=True, just do `hist, bins = numpy.histogram(*args) then hist = hist.astype(float)/sum(hist)`. That will ensure that the conditions for pdf are met. – jtitusj May 19 '15 at 14:32
  • Thanks but still I try a way to have the pdf of an array without using numpy.histogram. – DimKoim May 19 '15 at 18:25
  • Does anyone else give an opinion? – DimKoim May 19 '15 at 19:44
1

You can't compare numpy.histogram() and scipy.stats.norm() for this sample reason:

scipy.stats.norm() is A normal continuous random variable while numpy.histogram() deal with sequences (discontinuous)

farhawa
  • 10,120
  • 16
  • 49
  • 91
  • Ok if you use the scipy.stats.rv_continuous.pdf then you will have the correct result. Can you give me an example? – DimKoim May 19 '15 at 13:44
1

Plotting a Continuous Probability Function(PDF) from a Histogram – Solved in Python. refer this blog for detailed explanation. (http://howdoudoittheeasiestway.blogspot.com/2017/09/plotting-continuous-probability.html) Else you can use the code below.

n, bins, patches = plt.hist(A, 40, histtype='bar')
plt.show()
n = n/len(A)
n = np.append(n, 0)
mu = np.mean(n)
sigma = np.std(n)
plt.bar(bins,n, width=(bins[len(bins)-1]-bins[0])/40)
y1= (1/(sigma*np.sqrt(2*np.pi))*np.exp(-(bins - mu)**2 /(2*sigma**2)))*0.03
plt.plot(bins, y1, 'r--', linewidth=2)
plt.show()