13

I've been looking for a way to do multiple Gaussian fitting to my data. Most of the examples I've found so far use a normal distribution to make random numbers. But I am interested in looking at the plot of my data and checking if there are 1-3 peaks.

I can do this for one peak, but I don't know how to do it for more.

For example, I have this data: http://www.filedropper.com/data_11

I have tried using lmfit, and of course scipy, but with no nice results.

Thanks for any help!

twasbrillig
  • 17,084
  • 9
  • 43
  • 67
astromath
  • 302
  • 1
  • 3
  • 13
  • 1
    Your question is not entirely clear: do you want to fit a Gaussian to your (rather noisy) data? Do you want to find the location of the maxima? Is the data the sum of 1-3 Gaussians and you'd like to get the mean and standard variance of each? – Oliver W. Nov 14 '14 at 18:06
  • 1
    Hi! Thanks for the reply :) I want to fit one Gaussian for each peak. – astromath Nov 15 '14 at 17:52
  • " Is the data the sum of 1-3 Gaussians and you'd like to get the mean and standard variance of each?" exactly! – astromath Nov 16 '14 at 21:44

1 Answers1

20

Simply make parameterized model functions of the sum of single Gaussians. Choose a good value for your initial guess (this is a really critical step) and then have scipy.optimize tweak those numbers a bit.

Here's how you might do it:

import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize

data = np.genfromtxt('data.txt')
def gaussian(x, height, center, width, offset):
    return height*np.exp(-(x - center)**2/(2*width**2)) + offset
def three_gaussians(x, h1, c1, w1, h2, c2, w2, h3, c3, w3, offset):
    return (gaussian(x, h1, c1, w1, offset=0) +
        gaussian(x, h2, c2, w2, offset=0) +
        gaussian(x, h3, c3, w3, offset=0) + offset)

def two_gaussians(x, h1, c1, w1, h2, c2, w2, offset):
    return three_gaussians(x, h1, c1, w1, h2, c2, w2, 0,0,1, offset)

errfunc3 = lambda p, x, y: (three_gaussians(x, *p) - y)**2
errfunc2 = lambda p, x, y: (two_gaussians(x, *p) - y)**2

guess3 = [0.49, 0.55, 0.01, 0.6, 0.61, 0.01, 1, 0.64, 0.01, 0]  # I guess there are 3 peaks, 2 are clear, but between them there seems to be another one, based on the change in slope smoothness there
guess2 = [0.49, 0.55, 0.01, 1, 0.64, 0.01, 0]  # I removed the peak I'm not too sure about
optim3, success = optimize.leastsq(errfunc3, guess3[:], args=(data[:,0], data[:,1]))
optim2, success = optimize.leastsq(errfunc2, guess2[:], args=(data[:,0], data[:,1]))
optim3

plt.plot(data[:,0], data[:,1], lw=5, c='g', label='measurement')
plt.plot(data[:,0], three_gaussians(data[:,0], *optim3),
    lw=3, c='b', label='fit of 3 Gaussians')
plt.plot(data[:,0], two_gaussians(data[:,0], *optim2),
    lw=1, c='r', ls='--', label='fit of 2 Gaussians')
plt.legend(loc='best')
plt.savefig('result.png')

result of fitting

As you can see, there is almost no difference between these two fits (visually). So you can't know for sure if there were 3 Gaussians present in the source or only 2. However, if you had to make a guess, then check for the smallest residual:

err3 = np.sqrt(errfunc3(optim3, data[:,0], data[:,1])).sum()
err2 = np.sqrt(errfunc2(optim2, data[:,0], data[:,1])).sum()
print('Residual error when fitting 3 Gaussians: {}\n'
    'Residual error when fitting 2 Gaussians: {}'.format(err3, err2))
# Residual error when fitting 3 Gaussians: 3.52000910965
# Residual error when fitting 2 Gaussians: 3.82054499044

In this case, 3 Gaussians gives a better result, but I also made my initial guess fairly accurate.

Oliver W.
  • 13,169
  • 3
  • 37
  • 50
  • Hello.Thank you very much for your answer.I had tried getting two separate Gaussians and then combining them,but I understand that was a wrong idea after I saw your solution.Could you please explain to me what the "center" and "offset" parameters are?Thank you very much for your help! – astromath Nov 17 '14 at 09:11
  • Those would be the mean of the Gaussian and a vertical offset that is given to it. Check my definition of `Gaussian` to verify ;-) Welcome to Stackoverflow. Don't forget to [upvote or accept the answer](https://stackoverflow.com/help/someone-answers) if it helped you. – Oliver W. Nov 17 '14 at 11:46
  • Hi Oliver!Thank you again :)i do understand the script,but how did you guess'' the mean?I would use numpy for that ,but I feel that there is something more simple and faster that you did. Thank you once more :) – astromath Nov 17 '14 at 20:51
  • @astromath, my guesses for the mean were based merely on a visual inspection of the data. However, you could use one of the many peak detection algorithms to find two of the local maxima easily. – Oliver W. Nov 17 '14 at 21:25
  • OK thank you:)I have some scripts for that job.I will see what I can do.Again thank you very much Oliver :) – astromath Nov 17 '14 at 21:41
  • The dotted graph does not look like being the sum of only two gaussians. I guess it is a mistake. – Christian K. Nov 18 '14 at 14:25
  • @astromath: You might be interested in using an interactive program, like e.g. peak-o-mat (http://lorentz.sf.net) which is written in python and heavily uses numpy/scipy btw. Especially when having to guess starting paramters for the fit it is more convenient than doing the _visual inspection_ that Oliver mentioned. Even more if the peaks are very close to each other. – Christian K. Nov 18 '14 at 14:30
  • @ChristianK. You were correct, an oversight on my part. The code replotted the tripleGaussian twice. I've corrected it now, including the figure. Thank you for the observant look! – Oliver W. Nov 18 '14 at 17:37
  • Thank you both guys!Ι have managed to fit some other data as well. ChristianK. this software looks nice.I tried it a bit and for sure I will spend more time to get a good feeling of it.Thank you both guys :) – astromath Nov 19 '14 at 07:40