Still having trouble with curve fitting

Question

I already opened a question on this topic, but I wasn't sure, if I should post it there, so I opened a new question here.

I have trouble again when fitting two or more peaks. First problem occurs with a calculated example function.

xg = np.random.uniform(0,1000,500)
mu1 = 200
sigma1 = 20
I1 = -2

mu2 = 800
sigma2 = 20
I2 = -1

yg3 = 0.0001*xg
yg1 = (I1 / (sigma1 * np.sqrt(2 * np.pi))) * np.exp( - (xg - mu1)**2 / (2 * sigma1**2) )
yg2 = (I2 / (sigma2 * np.sqrt(2 * np.pi))) * np.exp( - (xg - mu2)**2 / (2 * sigma2**2) )
yg=yg1+yg2+yg3

plt.figure(0, figsize=(8,8))
plt.plot(xg, yg, 'r.')

I tried two different approaches, I found in the documentation, which are shown below (modified for my data), but both give me wrong fitting data and a messy chaos of graphs (I guess one line per fitting step).

1st attempt:

import numpy as np
from lmfit.models import PseudoVoigtModel, LinearModel, GaussianModel, LorentzianModel
import sys
import matplotlib.pyplot as plt

gauss1 = PseudoVoigtModel(prefix='g1_')
pars.update(gauss1.make_params())

pars['g1_center'].set(200)
pars['g1_sigma'].set(15, min=3)
pars['g1_amplitude'].set(-0.5)
pars['g1_fwhm'].set(20, vary=True)
#pars['g1_fraction'].set(0, vary=True)

gauss2  = PseudoVoigtModel(prefix='g2_')
pars.update(gauss2.make_params())

pars['g2_center'].set(800)
pars['g2_sigma'].set(15)
pars['g2_amplitude'].set(-0.4)
pars['g2_fwhm'].set(20, vary=True)
#pars['g2_fraction'].set(0, vary=True)

mod = gauss1 + gauss2 + LinearModel()

pars.add('intercept', value=0, vary=True)
pars.add('slope', value=0.0001, vary=True)

init = mod.eval(pars, x=xg)
out = mod.fit(yg, pars, x=xg)

print(out.fit_report(min_correl=0.5))
plt.figure(5, figsize=(8,8))
out.plot_fit()

When I include the 'fraction'-parameter, I often get

'NameError: name 'pv1_fraction' is not defined in expr='<_ast.Module object at 0x00000000165E03C8>'.

although it should be defined. I get this Error for real data with this approach, too.

2nd attempt:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import lmfit

def gauss(x, sigma, mu, A):
    return A*np.exp(-(x-mu)**2/(2*sigma**2))

def linear(x, m, n):
    return m*x + n

peak1 = lmfit.model.Model(gauss, prefix='p1_')
peak2 = lmfit.model.Model(gauss, prefix='p2_')
lin = lmfit.model.Model(linear, prefix='l_')
model = peak1 + lin + peak2
params = model.make_params()

params['p1_mu'] = lmfit.Parameter(value=200, min=100, max=250)
params['p2_mu'] = lmfit.Parameter(value=800, min=100, max=1000)
params['p1_sigma'] = lmfit.Parameter(value=15, min=0.01)
params['p2_sigma'] = lmfit.Parameter(value=20, min=0.01)
params['p1_A'] = lmfit.Parameter(value=-2, min=-3)
params['p2_A'] = lmfit.Parameter(value=-2, min=-3)
params['l_m'] = lmfit.Parameter(value=0)
params['l_n'] = lmfit.Parameter(value=0)

out = model.fit(yg, params, x=xg)

print out.fit_report()
plt.figure(8, figsize=(8,8))
out.plot_fit()

So the result looks like this, in both cases. It seems to plot all fitting attempts, but never solves it correctly. The best fitted parameters are in the range that I gave it.

Anyone knows this type of error? Or has any solutions for this? And does anyone know how to avoid the NameError when calling a model function from lmfit with those approaches?

hey there, could you copy/paste your data values? (comma separated) I like this question and I may have an idea I'd like to try. — user1269942, Dec 23 '15 at 00:31
found this site. Maybe it helps http://pico.dreamhosters.com/PeakFitWithPython.html — Moritz, Dec 23 '15 at 23:51
Are you sure there is a problem with the fit and not just the display of the data? You don't show the fit results for the parameters, only the plot of the data, initial, and final fit. It's clear from the plots that the fitted function changed. Unfortunately, you see lots of spurious lines in the plot because you plot the data with points but the fit with lines. With your definition of `xg = np.random.uniform(0,1000,500)` the `xg` will not be monotonically increasing, and such plot artifacts are inevitable. — M Newville, Jan 16 '16 at 14:06
Unfortunately I can't show you fit data, because there where none. All I received was zeros for all parameters. So I still think there is something wrong with the fit. And I don't understand what you say about the example data. They are quite realistic in a way, because it simulates an experiment, where a for each x data point a y data point is measured (it scans through the whole range with a certain step size). So in experiment, I would get an array for x which looks similar to this example data. — kire, Jan 21 '16 at 16:55

score 2 · Answer 1 · answered Dec 23 '15 at 20:21

I have a somewhat tolerable solution for you. Since I don't know how variable your data is, I cannot say that it will work in a general sense but should get you started. If your data is along 0-1000 and has two peaks or dips along a line as you showed, then it should work.

I used the scipy curve_fit and put all of the components of the function together into one function. One can pass starting locations into the curve_fit function. (you can probably do this with the lib you're using but I'm not familiar with it) There is a loop in loop where I vary the mu parameters to find the ones with the lowest squared error. If you are needing to fit your data many times or in some real-time scenario then this is not for you but if you just need to fit some data, launch this code and grab a coffee.

from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
import pylab
from matplotlib import cm as cm
import time

def my_function_big(x, m, n, #lin vars
                       sigma1, mu1, I1,  #gaussian 1
                       sigma2, mu2, I2):  #gaussian 2

  y = m * x + n + (I1 / (sigma1 * np.sqrt(2 * np.pi))) * np.exp( - (x - mu1)**2 / (2 * sigma1**2) ) + (I2 / (sigma2 * np.sqrt(2 * np.pi))) * np.exp( - (x - mu2)**2 / (2 * sigma2**2) )
  return y


#make some data
xs = np.random.uniform(0,1000,500)
mu1 = 200
sigma1 = 20
I1 = -2

mu2 = 800
sigma2 = 20
I2 = -1

yg3 = 0.0001 * xs
yg1 = (I1 / (sigma1 * np.sqrt(2 * np.pi))) * np.exp( - (xs - mu1)**2 / (2 * sigma1**2) )
yg2 = (I2 / (sigma2 * np.sqrt(2 * np.pi))) * np.exp( - (xs - mu2)**2 / (2 * sigma2**2) )
ys = yg1 + yg2 + yg3

xs = np.array(xs)
ys = np.array(ys)

#done making data



#start a double loop...very expensive but this is quick and dirty
#it would seem that the regular optimizer has trouble finding the minima so i 
#found that having the near proper mu values helped it zero in much better

start = time.time()

serr = []
_x = []
_y = []
for x in np.linspace(0, 1000, 61):
  for y in np.linspace(0, 1000, 61):
    cfiti = curve_fit(my_function_big, xs, ys, p0=[0, 0, 1, x, 1, 1, y, 1], maxfev=20000000)
    serr.append(np.sum((my_function_big(xs, *cfiti[0]) - ys) ** 2))
    _x.append(x)
    _y.append(y)

serr = np.array(serr)
_x = np.array(_x)
_y = np.array(_y)

print 'done loop in loop fitting'
print 'time: %0.1f' % (time.time() - start)

gridsize=20
plt.subplot(111)
plt.hexbin(_x, _y, C=serr, gridsize=gridsize, cmap=cm.jet, bins=None)
plt.axis([_x.min(), _x.max(), _y.min(), _y.max()])
cb = plt.colorbar()
cb.set_label('SE')
plt.show()   

ix = np.argmin(serr.ravel())
mustart1 = _x.ravel()[ix]
mustart2 = _y.ravel()[ix]
print mustart1
print mustart2

cfit = curve_fit(my_function_big, xs, ys, p0=[0, 0, 1, mustart1, 1, 1, mustart2, 1], maxfev=2000000000)

xp = np.linspace(0, 1000, 1001)

plt.figure()
plt.scatter(xs, ys) #plot synthetic dat
plt.plot(xp, my_function_big(xp, *cfit[0]), '-', label='fit function')  #plot data evaluated along 0-1000
plt.legend(loc=3, numpoints=1, prop={'size':12})
plt.show()    
pylab.close()

Good luck!

score 0 · Answer 2 · edited Sep 28 '18 at 02:37

0

In your first attempt:

pars['g1_fraction'].set(0, vary=True)

The fraction must be a value between 0 and 1, but I believe that cannot be zero. Try to put something like 0.000001, and it will work.

edited Sep 28 '18 at 02:37

Stephen Rauch

47,830
31
106
135

answered Sep 28 '18 at 02:18

Camila

1

Still having trouble with curve fitting

2 Answers2