Scipy.optimize not fiting to my data

Question

I cannot get scipy.optimize.curve_fit to properly fit my data which is visually apparent. I know approximately what the parameter values should be and if I evaluate the function with the given parameters the calculated and experimental data appear to agree well:

calculated and experimental data

However, when I use scipy.optimize.curve_fit the output parameters with the smallest error is a much worse fit (by visual inspection). If I use the "known" parameters as my initial guess and bound the parameters to a relatively narrow window as shown in the example of output from fit function:

example of output from fit function

I obtain error values ~10^2 times larger but the visual appearance of the fit seems better. The only way I can get a decent looking fit for the data is to bound all the parameters with ~ 0.3 units of the "known" parameter. I plan on using this code to fit more complex data that I will not know the parameters before hand, so I cannot just use the calculated plot.

The relevant code is included below:

import matplotlib.pyplot as plt
import numpy as np
import scipy
from scipy.optimize import curve_fit
d_1= 2.72 #Anstroms
sig_cp_1= 0.44
sig_int_1= 1.03
d_1, sig_cp_1,sig_int_1=float(d_1),float(sig_cp_1),float(sig_int_1)
Ref=[]
Qz_F=[]
Ref_F=[]
g=open("Exp_Fresnal.csv",'rb')#"Test_Fresnal.csv", 'rb')
reader=csv.reader(g)
for row in reader:
    Qz_F.append(row[0])
    Ref.append(row[1])
    Ref_F.append(row[2])
Ref=map(lambda a:float(a),Ref)
Ref_F=map(lambda a:float(a),Ref_F)
Qz_F=map(lambda a:float(a),Qz_F)
Ref_F_arr=np.array((Ref_F))
Qz_arr=np.array((Qz_F))
x=np.array((Qz_arr,Ref_F))
def func(x,d,sig_int,sig_cp):
     return (x[1])*(abs(x[0]*d*(np.exp((-sig_int**2)*(x[0]**2)/2)/(1-np.exp(complex(0,1)*x[0]*d)*np.exp((-sig_cp**2)*(x[0]**2)/2)))))**2
DC_ref=func(x,d_1,sig_int_1,sig_cp_1)
Y=np.array((Ref))
popt, pcov=curve_fit(func,x,Y,)#p0=[2.72,1.0,0.44])
perr=np.sqrt(np.diag(pcov))
print "par=",popt;print"Var=",perr
Fit=func(x,*popt)
Fit=func(x,*popt)
Ref=np.transpose(np.array([Ref]))
Qz_F=np.transpose(Qz_F)

plt.plot(Qz_F, Ref, 'bs',label='Experimental')
plt.plot(Qz_F, Fit, 'r--',label='Fit w/ DCM model')
plt.axis([0,3,10**(-10),100])
plt.yscale('log')
plt.title('Reflectivity',fontweight='bold',fontsize=15)
plt.ylabel('Reflectivity',fontsize=15)
plt.xlabel('qz /A^-1',fontsize=15)
plt.legend(loc='upper right',numpoints=1) 
plt.show()

The arrays are imported from a file (which I cannot include) and there are no outlier points that would cause the fit to become this distorted. Any help is appreciated.

Edit I included additional code and the input data to go along with the code but you will have to re-save it as a MS-Dos .CSV

We don't have your input data, and we don't see how you actually produce the graphs (which you use to estimate the goodness of fit; reduced chisq is probably better). Which makes it really hard to help you. — , Sep 30 '16 at 23:48
Please also include the figures directly into question, so that the question becomes easier to read and more standalone. — , Sep 30 '16 at 23:48
Keep in mind that with complex functions, it is easy to get stuck in a local minimum for your fit. — , Sep 30 '16 at 23:51
I don't know if this is the cause of the problem, but it looks like the density of points is much higher in the interval (0, 0.5) than in (0.5, 2.5). That could, in effect, give the interval (0, 0.5) a much higher "weight". `curve_fit` is a least-squares fit. Try computing the least-squares error (i.e. the sum of the squares of the errors) for the fit returned by `curve_fit` and for the fit that looks better. That will help determine if `curve_fit` is actually behaving as expected. — Warren Weckesser, Oct 01 '16 at 00:15
Evert- Sorry, I tried to post the images in the question but it said I didn't have enough reputation. I included the input data and additional code to make it work. Thanks — Charco, Oct 01 '16 at 01:43
Warren- I think that has to be it. The data at larger qz values is actually more informative so I will have to do some weighting since I don't care too much about the data at lower qz values. Thanks for the help. — Charco, Oct 01 '16 at 01:47

score 3 · Accepted Answer · edited May 23 '17 at 12:08

@WarrenWeckesser has a really good point, but further note that the y axis is logarithmic. That apparently huge error at the right end is something like 1e-5 in magnitude, while the points on the top left have reflectivity values of around 0.1. The square error coming from the tail is simply insignificant compared to the huge terms on the left.

I'm sure curve_fit works great. If you want a better visual fit, I suggest trying a fit to log(y) with the log() of your model (either that, or weight your points during fitting); then the result might be more stable visually (and from a physical point of view). Since you're probably trying to give an overall broad-spectrum description of your system, this might be closer to what you expect (but this will inevitably lead to a less precise fit where the reflectivity is high).

Scipy.optimize not fiting to my data

1 Answers1