Fitting a model with some known parameters to an experimental dataset in python, in order to optimise other parameters

Question

I have an experimental dataset 1 which plots intensity as a function of energy. These are arrays of 1800 datapoints.

I have been trying to fit a model to this data, given by the equation below:

Imodel = I0 * ((math.cos(phi) + (beta * f1))**2 + (math.sin(phi) + (beta*f2))**2 + Ioff

I have 2 other datasets of f1 vs. energy and f2 vs. energy 2. These are arrays of 700 datapoints, albeit over the same energy range as the first dataset.

I want to use this model function together with the f1 and f2 data to find optimal values of the other 4 parameters (I0, phi, beta, Ioff) where this model function fits the experimental dataset exactly.

I have been looking into curve_fit and least_squares from the scipy.optimize package, as well as linear regression packages such as lmfit and scikit, but to no avail.

can anyone help? Thanks

Is there no coefficient ( I1 for example) in front of the second term, such as : Imodel = I0 * ((math.cos(phi) + (beta * f1))**2 + I1 * (math.sin(phi) + (beta*f2))**2 + Ioff ? — JJacquelin, Oct 14 '22 at 14:13
Could you joint to the question a representative example of data ( Imodel , f1 , f2 ) made of a smaller number of points than 700, for example randomly taken among the 700. This is necessary to better understand the problem and for preliminary rough checking of possible methods of fitting. — JJacquelin, Oct 14 '22 at 14:24
My apologies, I missed out some brackets. The function should be defined as: I0 * ((math.cos(phi)+ (beta * f1))**2+ (math.sin(phi)+(beta*f2))**2) + Ioff. f1 and f2 vary as a function of energy, and so I want to use these values as known variable parameters in the equation for I_model, and subsequently fit this I_model function to my experimental data of intensity vs. energy in order to obtain 'best fit' values of the other parameters (I0, phi, beta, Ioff). Apologies, this is my first time using SO to post a query. Let me know if this is helpful — Ayrtonb1, Oct 14 '22 at 14:40
I have also attached images in the original post to demonstrate my data graphically. — Ayrtonb1, Oct 14 '22 at 14:46
The method of fitting is very simple (in theory). Nevertheless I want to test it before answering because I want to avoid wasting a lot of time to edit the answer if the test was not good. That is why I asked for numerical data, not graph. — JJacquelin, Oct 14 '22 at 18:06

score 0 · Answer 1 · answered Oct 15 '22 at 13:25

Presently I have no representative data from Ayrtonb1 in order to test the method proposed below. The method seems convenient from theoretical basis but one cannot be sure that it will be satisfying with the OP data.

Nevertheless a preliminary test was carried out with a "toy" data (shown below).

I suppose that the screencopy below is sufficient to understand the method and to reproduce the calculus with real data.

The result of this preliminary test is rather good :

LRMSE<2 for a range up to 600. (Least Root Mean Square Error).

LRMSRE<2% (Least Root Mean Square Relative Error).

score 0 · Answer 2 · answered Oct 16 '22 at 15:48

Your data and formula look suspiciously like resonant (or anomalous) X-ray diffraction data, with measurements of scattered intensity going across the Zn K-edge. Although you do not say this, the discussion here will assume that. You say you have 1800 measurements, presumably as a function of X-ray energy in eV. The resonant scattering factors (f1, f2) you show seem to be idealized and possibly "typical", and perhaps not specifically for the Zn K-edge -- at the very least the energy scale shown is not the same as your data.

You will want to treat the data and model the intensity as a function of X-ray energy. And you will want realistic values for f1 and f2 for the element of interest, and at the actual energy points for your data. I recommend using xraydb (full disclosure: I am the lead author) [pip install xraydb], so that you can do


import numpy as np
import xraydb
#edata, idata = function_to_extract_your_data()
# or perhaps testing with
edata = np.linspace(9500, 10500, 501)

f1 = xraydb.f1_chantler('Zn', edata) 
f2 = xraydb.f2_chantler('Zn', edata)

As written, your intensity function does not directly depend on energy, though it might at a later date, say to make that offset be linear in energy, not just a constant. You might write a function like:

def intensity(en, phi, beta, scale=1, slope=0, offset=0, f1=-1, f2=1):
    costerm = np.cos(phi) + beta*f1
    sinterm = np.sin(phi) + beta*f2
    return scale * (costerm**2 + sinterm**2) + slope*en + offset

with that you can start just trying out some values to get a feel for the function and how it compares to your data:

import matplotlib.pyplot as plt

beta = 0.025 # Wild Guess!

for phi in np.pi*np.arange(20)/10:
    plt.plot(edata, intensity(edata, phi, beta, f1=f1, f2=f2), label='%.1f'%phi)

plt.legend()
plt.show()

It kind of looks like your value for phi would be around 5.5 to 6 (or -0.8 to -0.3). You could also try different values of beta and plot that with your actual data.

With that model for intensity and a feel for what the range of parameters is, you could then try to fit your data. To do that, I would recommend using lmfit (full disclosure: I am the lead author) [pip install lmfit], where you can create a model from your intensity model function - this will use the names of the function arguments to name the fitting parameters.

from lmfit import Model

imodel = Model(intensity, independent_vars=['en', 'f1', 'f2'])

params = imodel.make_params(scale=1, offset=0, slope=0, beta=0.1, phi=5.5)

That independent_vars will tell Model to not make fitting Parameters for the function arguments f1 and f2 and to expect them to be passed into any evaluation or fit. The other function arguments (scale, phi, etc) will be expected to become fitting variables. You do have to create a "Parameters" object and must give initial values for all parameters. You can put bounds on these or fix them so they are not altered in the fit, as with

params['scale'].min = 0  # force scale to be positive
params['slope'].vary = False # slope will be fixed at 0.

You can then evaluate the model with

init_value = imodel.eval(params, en=edata, f1=f1, f2=f2)

And then eventually do a fit with

result = imodel.fit(idata, params, en=edata, f1=f1, f2=f2)
print(result.fit_report())

plt.plot(edata, idata, label='data')
plt.plot(edata, init_value, label='initial fit')
plt.plot(edata, result.best_fit, label='best fit')
plt.legend()
plt.show()

Finally, for analysis of X-ray resonant scattering be sure to consider including absorption corrections in that intensity calculation. As you go across the Zn K edge, the absorption depth of the sample may change dramatically if the Zn concentration is high.

Dear Professor Newville, Thank you for your comprehensive response. I am indeed writing a script in order to model DAFS data and extract f'/f", as part of my PhD at UCL. I have been reading your work on this. My overall goal is to investigate a variety of oxides, and ultimately make this technique more commonplace. I am inexperienced in python, though I have used lmfit as suggested. I suppose the next step would be to iterate through this somehow using the Kramers-Kronig transforms in order to obtain f'/f"? Any further assistance would be greatly appreciated. — Ayrtonb1, Oct 19 '22 at 14:11

Fitting a model with some known parameters to an experimental dataset in python, in order to optimise other parameters

2 Answers2