fitting a group of sigmoids with different models with least_squares()?

Question

For a series of experiments measured at different time points, I'm trying to compare fits that fixed parameters for all experiments with fits that have an individual set of parameters per experiment, using an information criterion (like Akaike IC). Is that possible with scipy.optimize.least_squares?

Let's say that the measurements roughly follow sigmoid curves in time with parameters base (level before step), height (increase in level after step), stretch (duration of the step) and t50 (the time where the level is base + height/2). I model these in the code below, which returns a size [t, c] numpy ndarray measurements with t time points (experiments) and c values per time point:

import numpy as np;
import matplotlib.pyplot as plt;

# lifted, scaled, stretched and shifted sigmoid
def lssig ( t, base, height, stretch, t50 ):
return base + height / ( 1 + np.exp ( -1/stretch * ( t - t50 ) ) );

# number of curves per experiment
numcurves = 10;

# number of experiments (time pts)
num_experiments = 20;

# timepoints (with simulated variability in timing)
time_jitter = 2;
tstart  = np.linspace   ( -10, 30, num = num_experiments );
tstart += time_jitter * np.random.rand ( num_experiments );

# measurements + added noise (this is added noise, not the variability in the estimates)
measurements  = np.ndarray       ( shape = ( num_experiments, numcurves ) );
measure_noise = np.random.normal ( 0, .2,  ( num_experiments, numcurves ) );

# Variability of model parameters between curves:   
# par_spreading [0] -> variability in base (start value)
# par_spreading [1] -> variability in height (end - start)
# par_spreading [2] -> variability in stretch (duration of step)
# par_spreading [3] -> variability in t50 (middle of step)
par_spreading = np.ones ( 4 );

# parameters of the model:   
# every curve can have its own parameters, variability between curves 
# is given by the par_spreading value (see above) for that parameter    
height  = 8 * np.ones ( numcurves ) + par_spreading [0] * np.random.rand ( numcurves);
base    = 2 * np.ones ( numcurves ) + par_spreading [1] * np.random.rand ( numcurves);
stretch = 4 * np.ones ( numcurves ) + par_spreading [2] * np.random.rand ( numcurves);
t50     = 9 * np.ones ( numcurves ) + par_spreading [2] * np.random.rand ( numcurves);

# fill the measurement array
for t in range(num_experiments):
    for c in range (numcurves):
    
        measurements[ t, c ] = lssig ( tstart [t],
                                     base  [c], 
                                     height   [c],
                                     stretch   [c],
                                     t50 [c],
                                     );
measurements += measure_noise;
   
# plot curves per subject
plt.plot ( tstart, measurements );
plt.show ();

There is a very nice page with examples of using least_squares, where 1 sigmoid is fitted, and the parameters of the sigmoid are passed in an array theta. Using that recipe I can fit the curve of the 1st experiment at every time point measurements[:,0]:

from scipy.optimize import least_squares;

# point on the sigmoid, using array theta for parameters
def y ( theta, t ):
    return theta[0] + theta[1] / ( 1 + np.exp ( -1/theta[2] * ( t - theta[3] ) ) );

# objective function for least_squares: difference between noisy data & model
def fun(theta):
    return y(theta, tstart) - measurements[:,0];

# start with values for base, step, stretch and t50
theta0 = [8,2,4,9];
res1   = least_squares(fun, theta0);

# show the parameters & plot the functions
print      ( 'real: {}, estimated: {}'.format ( [ base[0], height[0], stretch[0], t50[0] ],             res1.x ) );
#
plt.plot   (  tstart, measurements[:,0],  tstart, y(res1.x,tstart) );
plt.legend ( ('noisy', 'lsq' ) );
plt.show   ( );

The reason that I used the 4 separate parameters in the code above is that base can either be a single scalar (same for all curves per experiment) or an array (each curve has an individual base value). Is it possible to use least_squares this way? Or do all the parameters need to be put in on 1D array?

It is possible to flatten the array of parameters (like with theta) but that becomes incredibly messy because in the model where every experiment has its own base parameter, the second parameter would be a value for base, but in the model where there is one base parameter for all experiments, the second value in the array would be a value for height, and so on.

Ideally the parameters would still be separated, so that their length could vary independently. Is there an administratively traceable / aesthetically pleasing way to do this?

Romain F · Accepted Answer · 2020-06-29T15:52:29.967

2

Maybe you could reduce a lot your question's length and increase readability by giving a minimal reproducible example! ;)

If I understood well your issue:

To avoid to handle a "messy" set of parameters, I often create two small functions such as param2minimizer and its opposite minimizer2param. These functions make the job of organizing all the parameters for the minimizer. The code is then much easier to understand.

For example, let's say you want to optimize any number of [base,height] couples

def param2minimizer(bases,heights):
    return np.concatenate((bases,heights))

def minimizer2param(m):
    L = len(m)
    if L%2: raise ValueError("Incompatible length")
    bases = m[0:L//2]
    heights = m[L//2:]
    return bases, heights

You can modify these two functions to match exactly your needs.

Regarding your question on the number of dimensions of the minimizing variables, you may have a look at the documentation of least_squares. I think it is a better habit to systematically give a vector (1D) and not a matrix (2D), even if a specific minimizer would be 2D permissive. Some minimizers are only 1D permissive, moreover the optimization algorithms are theoretically and often numerically described on a 1D vector of variables. It is safer to use 1D and it conforms more to the usage.

edited Jun 29 '20 at 15:52

answered Jun 29 '20 at 15:44

Romain F

397
6
14

Thanks, @RomainF! Yes part of the problem for me was writing down what I wanted. My data are a group of parameterised functions, with noise on the data as well as the parameters. For these data I want to compare fits with each parameter fixed for the whole group, or variable within the group. This is what causes the variable indexing in the 1D array. It looks like https://lmfit.github.io/lmfit-py lets you access function parameters in a struct, which may be the solution! – alle_meije Jun 30 '20 at 07:38
Problem with `lmfit` is that to fit multiple curves (_e.g._, a matrix where every column is a curve) you need to flatten them into one long concatenated curve. But it has additional features, _e.g._, to 'lock' parameters to be the same for all curves, nice to have that built in! – alle_meije Jun 30 '20 at 14:13
You can also have a look at [curve_fit](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html). It seems it can handle arrays, but I didn't see possibility to lock parameters together – Romain F Jun 30 '20 at 21:29
All these functions are mainly packaging around one core minimizer. I think you had a good start with `least_squares`. You can perform all calculations with 2D arrays as you need, and at the very last line of `fun` you finally `flatten` everything for the minimizer. – Romain F Jun 30 '20 at 21:44
Thanks, and I agree: they seem to be wrappers around the same nonlinear optimisation routine. As I'm quite new to this whole thing the easy interface of `lmfit`, especially with fixing parameters _etc_. seems to fit my needs best... – alle_meije Jul 02 '20 at 09:04

fitting a group of sigmoids with different models with least_squares()?

1 Answers1