0

I am working with the lmfit python package https://lmfit.github.io/lmfit-py/ for fitting data to a specified non-linear function within certain permitted ranges of variation for some parameters (which is mainly why I found lmfit attractive).

Generally the two important lines of code for lmfit are;

def fcn2min(params, x_data, y_data):

result = minimize(fcn2min, params, args=(x_data, y_data))

My application is to perform a global fit across 36 graphs of data. The catch is that a few parameters are not being fitted for (vary=None), are known quantities, and vary across all 36 graphs but remain the same within their own graph. Currently I am trying to implment the following syntax in order to pass these known parameters with their associated x_data and y_data points.

def fcn2min(params, x_data, y_data, known_params):

result = minimize(fcn2min, params, args=(x_data, y_data, known_params))

Here x_data, y_data, and known_params are arrays of the same length. x_data and y_data consists of a single arrays of all 36 graphs, and known_params is a three column array with repeating entries for those parameters that are fixed in each graph.

At the moment the program is running pretty slow (30 minutes till execution is finished). Also at the moment the fitted curve is the same for all graphs, whereas I want it to fit each local graph with the known parameters and fit to only the global parameters.

I would like to ask if am I doing this properly? I see why minimize() would need a reference to the y_data to be fitted, but why does fcn2min() need y_data as an input? Could my fitting program get confused whether it is fitting the y_data or the known_params array? Is there a better way to do this through lmfit or should I look for another numerical package?

1 Answers1

0

It's a little hard to get all the details without an actual example code, but I can try to address a few of the topics you raise.

First, fcn2min() needs to be passed the y_data (and x_data for that matter) arrays because from the point of view of minimize() (and the underlying solvers), it expects the objective function to return the array to be minimized given the variable parameters -- it does not think about "data" at all. For fitting data, your objective function may very well be return y_data - model_for_y(params, x_data)

but the fitting algorithm cares about how the parameter vales alter the array to be minimized, not how that array is calculated. It is called "minimize()", not "fit()".

Second, if I understand correctly, you have many (36) data sets that all have approximately the same model, and while some of the parameter values may be different for each data set, some others should be shared across all data sets.

With lmfit, the best way to do this is to do one global fit. Make one Parameters() object that has 36 set of parameters for each model. For example, if you were fitting 36 data sets each with a peak-like function that each had parameters "amp", "center", "sigma", "offset", you might make parameters named amp01, center01, ..., sigma36, offset36. Yes, that could be many parameters for the fit.

Then, in your objective function, loop over each data set, using only the appropriate parameters to model that data set, building up 36 separate residuals for the 36 data sets. Finally, concatenate these to give a very large array to be minimized.

If each of the 36*nparameters_per_dataset parameters were independent variables, you probably haven't gained much over running 36 individual fits. But, if you can assert that some of the variables will have the same value for all data sets, those fits won't be independent. I think this is what you're looking to do.

Let's say you knew that "offset" was the same for all data sets. You could just make 1 offset Parameter and use that for all data sets, so not have offset01 ... offest36, just 1 offset.

To be more flexible, you could define offset01, ..., offset36, but use a lmfit constraint to tie their values

params = Parameters() params.add('offset01', value=0, vary=True) params.add('offset02', expr='offset01') params.add('offset03', expr='offset01') ...

using such a definition, offset01 will vary in the fit, and offset2 and offset03 will have the same value as offset01 -- they won't be independent but constrained. Constraint expressions can be more complicated, and include multiple parameter names and many Python and numpy functions.

Hope that helps.

M Newville
  • 7,486
  • 2
  • 16
  • 29
  • You were right, I misunderstood minimize(). It requires a return of model - y_data, rather than just return model. I altered my model function to def fcn2min(params, x_data, y_data=None): and then put an if-statement at the end stating if y_data is None: return model. This helped in using fcn2min in both minimize() and plotting out the model with result.params in params place. – user6581543 Jul 14 '16 at 20:17
  • As for my second question, I came up with a cheat where I paired each known_params with their respective x_data into one single list, then in the fcn2min I use a for-loop to pick out the x_data and appropriate known_params and assign them inside fcn2min. Every time I tried passing a list within a list to better organize the array, minimize() kept giving me errors for N > M. – user6581543 Jul 14 '16 at 20:24
  • it should be possible to pass in lists of lists or other data structures, but the value returned by the objective function should be a 1D array with type Float64. – M Newville Jul 15 '16 at 02:37