scipy curve_fi returns initial parameters estimates

Question

I am triyng to use scipy curve_fit function to fit a gaussian function to my data to estimate a theoretical power spectrum density. While doing so, the curve_fit function always return the initial parameters (p0=[1,1,1]) , thus telling me that the fitting didn't work. I don't know where the issue is. I am using python 3.9 (spyder 5.1.5) from the anaconda distribution on windows 11. here a Wetransfer link to the data file https://wetransfer.com/downloads/6097ebe81ee0c29ee95a497128c1c2e420220704110130/86bf2d

Here is my code below. Can someone tell me what the issue is, and how can i solve it? on the picture of the plot, the blue plot is my experimental PSD and the orange one is the result of the fit.

import numpy as np
import math
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.constants as cst

File = np.loadtxt('test5.dat')
X = File[:, 1]
Y = File[:, 2]


f_sample = 50000
time=[]
for i in range(1,len(X)+1):
    t=i*(1/f_sample)
    time= np.append(time,t)
N = X.shape[0]  # number of observation
N1=int(N/2)
delta_t = time[2] - time[1]
T_mes = N * delta_t
freq = np.arange(1/T_mes, (N+1)/T_mes, 1/T_mes)
freq=freq[0:N1]
fNyq = f_sample/2  # Nyquist frequency
nb = 350
freq_block = []

#  discrete fourier tansform
X_ft = delta_t*np.fft.fft(X, n=N)
X_ft=X_ft[0:N1]

plt.figure()
plt.plot(time, X)
plt.xlabel('t [s]')
plt.ylabel('x [micro m]')

# Experimental power spectrum on both raw and blocked data
PSD_X_exp = (np.abs(X_ft)**2/T_mes)
PSD_X_exp_b = []

STD_PSD_X_exp_b = []
for i in range(0, N1+2, nb):
    freq_b = np.array(freq[i:i+nb])  # i-nb:i
    psd_b = np.array(PSD_X_exp[i:i+nb])
    freq_block = np.append(freq_block, (1/nb)*np.sum(freq_b))
 
    PSD_X_exp_b = np.append(PSD_X_exp_b, (1/nb)*np.sum(psd_b))
STD_PSD_X_exp_b = np.append(STD_PSD_X_exp_b, PSD_X_exp_b/np.sqrt(nb))

plt.figure()
plt.loglog(freq, PSD_X_exp)
plt.legend(['Raw Experimental PSD'])
plt.xlabel('f [Hz]')
plt.ylabel('PSD')
plt.figure()
plt.loglog(freq_block, PSD_X_exp_b)
plt.legend(['Experimental PSD after blocking'])
plt.xlabel('f [Hz]')
plt.ylabel('PSD')

kB = cst.k  # Boltzmann constant [m^2kg/s^2K]
T = 273.15 + 25  # Temperature [K]
r = (2.8 / 2) * 1e-6  # Particle radius [m]
v = 0.00002414 * 10 ** (247.8 / (-140 + T))  # Water viscosity [Pa*s]
gamma = np.pi * 6 * r * v  # [m*Pa*s]
Do = kB*T/gamma  # expected value for D
f3db_o = 50000  # expected value for f3db
fc_o = 300  # expected value pour fc

n = np.arange(-10,11)


def theo_spectrum_lorentzian_filter(x, D_, fc_, f3db_):
    PSD_theo=[]
    for i in range(0,len(x)):
        # print(i)
        psd_theo=np.sum((((D_*Do)/2*math.pi**2)/((fc_*fc_o)**2+(x[i]+n*f_sample)
                    ** 2))*(1/(1+((x[i]+n*f_sample)/(f3db_*f3db_o))**2)))
       
        PSD_theo= np.append(PSD_theo,psd_theo)
            

    return PSD_theo

popt, pcov = curve_fit(theo_spectrum_lorentzian_filter, freq_block, PSD_X_exp_b, p0=[1, 1, 1], sigma=STD_PSD_X_exp_b, absolute_sigma=True, check_finite=True,bounds=(0.1, 10),  method='trf', jac=None)

D_, fc_, f3db_ = popt
D1 = D_*Do
fc1 = fc_*fc_o
f3db1 = f3db_*f3db_o

print('Diffusion constant D = ', D1, ' Corner frequency fc= ',fc1, 'f3db(diode,eff)= ', f3db1)

Welcome. Could you provide a [MCVE](https://stackoverflow.com/help/mcve) ? — l -_- l, Jul 04 '22 at 07:46
Have you made a plot of your data and your initial estimate? Then you can see how closely they match (or don't match). If the initial guess isn't anywhere near the data, that might mean the fit will fail, and you'll need to provide a better estimate. Also, if the data doesn't look like your fitting function at all, the fit is unlikely to yield a good result, or work at all. — 9769953, Jul 04 '22 at 08:17
i tried plotting the data and the initial estimates and it seems that the fitting failed . — Sonya N., Jul 04 '22 at 08:36
here is a link to access the data file . https://xfl.jp/fj7RTO — Sonya N., Jul 04 '22 at 08:48
Any particular reason that you are returning a list rather than a value to optimize in your model function? It seems weird :) — grfreitas, Jul 04 '22 at 09:22
Hum no; a colleague of mine return a list as well, just that her objective function is less complex than mine (my function takes into account other circumstances) . And the fitting actually didn't return the initial parameters in her case. — Sonya N., Jul 04 '22 at 09:35
Sorry, I'm not too enthousiastic to download a 5 MB file from a filesharing site I'm unfamiliar with. — 9769953, Jul 04 '22 at 10:50
" i tried plotting the data and the initial estimates and it seems that the fitting failed . " but I asked whether the data and initial estimate are far apart. Perhaps you can include the actual plot into your question? — 9769953, Jul 04 '22 at 10:51
i edited the question, including the plot and a wetransfer link to the data file. — Sonya N., Jul 04 '22 at 11:09
Thanks. One thing I notice now is that your data spans orders of magnitude, and your initial vs data offset is even more orders of magnitude. So you'll want your normalization value as fitting parameter close to the actual data, so you have a better start. It's a bit hard to see which value that is. But with such complicated formulas and orders of magntiude, having a starting value close to the actual data is important. — 9769953, Jul 04 '22 at 11:38
You may also want to consider whether you can use a log version of your data and formula (either on y alone, or both x and y), so that the scaling is more linear. This may help the fitting routine, though it does require rewriting your equation (or perhaps just taking the log of it). — 9769953, Jul 04 '22 at 11:39
Thank you ! I tried the normalizing approach by using the normalization values as fitting parameters, and the 3 other values (Do, fc_o and f3db_o) close to the actual data. I will also try to log approach and see what results i get. — Sonya N., Jul 04 '22 at 11:47

score 1 · Answer 1 · answered Jul 04 '22 at 12:36

I believe I've successfully fitted your data. Here's the approach I took.

First, I plotted your model (with popt=[1, 1, 1]) and the data you had. I noticed your data was significantly lower than the model. Then I started fiddling with the parameters. I wanted to push the model upwards. I did that by multiplying popt[0] by increasingly large values. I ended up with 1E13 as a ballpark value. Note that I have no idea if this is physically possible for your model. Then I jury-rigged your fitting function to multiply D_ by 1E13 and ran your code. I got this fit:

So I believe it's a problem of 1) inappropriate starting values and 2) inappropriate bounds. In your position, I would revise this model, check if there's any problems with units and so on.

Here's what I used to try to fit your model:

plt.figure()
plt.loglog(freq_block[:170], PSD_X_exp_b[:170], label='Exp')
plt.loglog(freq_block[:170], 
           theo_spectrum_lorentzian_filter(
               freq_block[:170],
               1E13*popt[0], popt[1], popt[2]),
           label='model'
          )
plt.xlabel('f [Hz]')
plt.ylabel('PSD')
plt.legend()

I limited the data to point 170 because there were some weird backwards values that made me uncomfortable. I would recheck them if I were you.

Here's the model code I used. I didn't change the curve_fit call (except to limit x to :170.

def theo_spectrum_lorentzian_filter(x, D_, fc_, f3db_):
    PSD_theo=[]
    D_ = 1E13*D_  # I only changed here
    for i in range(0,len(x)):
        psd_theo=np.sum((((D_*Do)/2*math.pi**2)/((fc_*fc_o)**2+(x[i]+n*f_sample)
                    ** 2))*(1/(1+((x[i]+n*f_sample)/(f3db_*f3db_o))**2)))
       
        PSD_theo= np.append(PSD_theo,psd_theo)
            
    return PSD_theo

Nice. About "Note that I have no idea if this is physically possible for your model". The question has `bounds=(0.1, 10)`, which to me suggests a (stringent) limit of the parameter value, or a bad model overall. Since 10^13 is quite a bit larger than the maximum boundary value of 10. — 9769953, Jul 04 '22 at 13:28
Yes, but her expected values for the parameters aren't within these bounds either, e.g., `f3db_o = 50000 # expected value for f3db`, `fc_o = 300 # expected value pour fc`. So it's up to @Sonya to determine better bounds. — K.Cl, Jul 04 '22 at 14:11
Thank you guys for your help ! I initially tried normalizing my values using D_, fc_ and f3db_ , so that i could multiply it later on by the other values (D_o,fc_o and f3db_o). In my approach, i didn' take into consideration the considerable difference in the order of magnitude between D_o and the other two. In retrospective, that's what i think was causing the issue; Thank you guys again for your help ! — Sonya N., Jul 05 '22 at 07:07

scipy curve_fi returns initial parameters estimates

1 Answers1