Un-normalized Gaussian curve on histogram

Question

I have data which is of the gaussian form when plotted as histogram. I want to plot a gaussian curve on top of the histogram to see how good the data is. I am using pyplot from matplotlib. Also I do NOT want to normalize the histogram. I can do the normed fit, but I am looking for an Un-normalized fit. Does anyone here know how to do it?

Thanks! Abhinav Kumar

Does this example help? http://matplotlib.org/examples/api/histogram_demo.html — DMH, Jul 22 '13 at 03:21
No, its basically what I dont want. I dont want a normalized. — Abhinav Kumar, Jul 22 '13 at 03:53

score 8 · Accepted Answer · answered Jul 22 '13 at 07:16

As an example:

import pylab as py
import numpy as np
from scipy import optimize

# Generate a 
y = np.random.standard_normal(10000)
data = py.hist(y, bins = 100)

# Equation for Gaussian
def f(x, a, b, c):
    return a * py.exp(-(x - b)**2.0 / (2 * c**2))

# Generate data from bins as a set of points 
x = [0.5 * (data[1][i] + data[1][i+1]) for i in xrange(len(data[1])-1)]
y = data[0]

popt, pcov = optimize.curve_fit(f, x, y)

x_fit = py.linspace(x[0], x[-1], 100)
y_fit = f(x_fit, *popt)

plot(x_fit, y_fit, lw=4, color="r")

enter image description here

This will fit a Gaussian plot to a distribution, you should use the pcov to give a quantitative number for how good the fit is.

A better way to determine how well your data is Gaussian, or any distribution is the Pearson chi-squared test. It takes some practise to understand but it is a very powerful tool.

Can we retrieve a, b and c for the fit you show above? I want to check with what I expect it to be. — Abhinav Kumar, Jul 30 '13 at 17:11
This is precisely `popt`. You will notice in getting the `y_fit` I have done `f(x_fit, *popt)` this is a trick to unpack the tuple of `popt` into the arguments of `f`. See the docs for more. — Greg, Jul 30 '13 at 22:04

score 4 · Answer 2 · answered Dec 10 '13 at 01:35

An old post I know, but wanted to contribute my code for doing this, which simply does the 'fix by area' trick:

from scipy.stats import norm
from numpy import linspace
from pylab import plot,show,hist

def PlotHistNorm(data, log=False):
    # distribution fitting
    param = norm.fit(data) 
    mean = param[0]
    sd = param[1]

    #Set large limits
    xlims = [-6*sd+mean, 6*sd+mean]

    #Plot histogram
    histdata = hist(data,bins=12,alpha=.3,log=log)

    #Generate X points
    x = linspace(xlims[0],xlims[1],500)

    #Get Y points via Normal PDF with fitted parameters
    pdf_fitted = norm.pdf(x,loc=mean,scale=sd)

    #Get histogram data, in this case bin edges
    xh = [0.5 * (histdata[1][r] + histdata[1][r+1]) for r in xrange(len(histdata[1])-1)]

    #Get bin width from this
    binwidth = (max(xh) - min(xh)) / len(histdata[1])           

    #Scale the fitted PDF by area of the histogram
    pdf_fitted = pdf_fitted * (len(data) * binwidth)

    #Plot PDF
    plot(x,pdf_fitted,'r-')

score 3 · Answer 3 · answered Nov 18 '13 at 20:45

3

Another way of doing this is to find the normalized fit and multiply the normal distribution with (bin_width*total length of data)

this will un-normalize your normal distribution

answered Nov 18 '13 at 20:45

aaveg

1,864
2
16
15

Un-normalized Gaussian curve on histogram

3 Answers3

Linked