I would like to make a lognormal fit to my already binned data. The bar plot looks like this:
Unfortunately, when I try to use the standard lognorm.pdf()
the shape of the fitted distribution is very different. I guess it's because my data is already binned. Here's the code:
times, data, bin_points = ReadHistogramFile(filename)
xmin = 200
xmax = 800
x = np.linspace(xmin, xmax, 1000)
shape, loc, scale = stats.lognorm.fit(data, floc=0)
pdf = stats.lognorm.pdf(x, shape, loc=loc, scale=scale)
area=data.sum()
plt.bar(bars, data, width=10, color='b')
plt.plot(x*area, pdf, 'k' )
Here's what the fitted distribution looks like:
Obviously there's something wrong also with the scaling. I'm less concerned about that though. My main issue is, the shape of the distribution. This might be duplicate to: this question but I could not find a correct solution. I tried it and still get a very similar shape as when doing the above. Thanks for any help!
Update:
By using curve_fit()
I was able to get somewhat of a fit. But I'm not satisfied yet. I'd like to have the original bins and not unity-bins. Also I'm not sure, what exactly is happening, and if there is not a better fit. Here's the code:
def normalize_integral(data, bin_size):
normalized_data = np.zeros(size(data))
print bin_size
sum = data.sum()
integral = bin_size*sum
for i in range(0, size(data)-1):
normalized_data[i] = data[i]/integral
print 'integral:', normalized_data.sum()*bin_size
return normalized_data
def pdf(x, mu, sigma):
"""pdf of lognormal distribution"""
return (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2)) / (x * sigma * np.sqrt(2 * np.pi)))
bin_points=np.linspace(280.5, 1099.55994, len(bin_points))
data=[9.78200000e+03 1.15120000e+04 1.18000000e+04 1.79620000e+04 2.76980000e+04 2.78260000e+04 3.35460000e+04 3.24260000e+04 3.16500000e+04 3.30820000e+04 4.84560000e+04 5.86500000e+04 6.34220000e+04 5.11880000e+04 5.13180000e+04 4.74320000e+04 4.35420000e+04 4.13400000e+04 3.60880000e+04 2.96900000e+04 2.66640000e+04 2.58720000e+04 2.57560000e+04 2.20960000e+04 1.46880000e+04 9.97200000e+03 5.74200000e+03 3.52000000e+03 2.74600000e+03 2.61800000e+03 1.50000000e+03 7.96000000e+02 5.40000000e+02 2.98000000e+02 2.90000000e+02 2.22000000e+02 2.26000000e+02 1.88000000e+02 1.20000000e+02 5.00000000e+01 5.40000000e+01 5.80000000e+01 5.20000000e+01 2.00000000e+01 2.80000000e+01 6.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
normalized_data_unitybins = normalize_integral(data,1)
plt.figure(figsize=(9,4))
ax1=plt.subplot(121)
ax2=plt.subplot(122)
ax2.bar(unity_bins, normalized_data_unitybins, width=1, color='b')
fitParams, fitCov = curve_fit(pdf, unity_bins, normalized_data_unitybins, p0=[1,1],maxfev = 1000000)
fitData=pdf(unity_bins, *fitParams)
ax2.plot(unity_bins, fitData,'g-')
ax1.bar(bin_points, normalized_data_unitybins, width=10, color='b')
fitParams, fitCov = curve_fit(pdf, bin_points, normalized_data_unitybins, p0=[1,1],maxfev = 1000000)
fitData=pdf(bin_points, *fitParams)
ax1.plot(bin_points, fitData,'g-')