1

I am trying to fit three time-evolution curves with two rate constants, k1 and k2. The system of equations I am trying to solve is:

A_t = A_0 * exp(-k1*t)
B_t = [A_0 * k1/(k2-k1)]* exp(-k1*t) - [A_0*(k1/(k2-k1)-B_0] * exp(-k2*t)
C_t = [A_0 * -k2/(k2-k1) ]* exp(-k1*t) + [A_0*(k1/(k2-k1)-B_0] * exp(-k2*t) + A_0 + B_0

where I want fit the best values of k1 and k2 to my data values of A,B and C, where A_t etc is the current population of A at time t, A_0=0.4 and B_0=0.6.

To solve this I am using the scipy.optimize.curve_fit function where I set up the equations as matrices u and w. In the following, I have manually entered the A_0=0.4 and B_0=0.6 into the function (which relates to part 2 of my question at the bottom):

def func(t,k1,k2):

    u = np.array([[0.4,0,0],
                  [0.4*k1/(k2-k1),-0.4*(k1/(k2-k1))+0.6,0],
                  [0.4*(-k2/(k2-k1)),0.4*k1/(k2-k1)-0.6,1]])


    w = np.array([np.exp(-t*k1),
                  np.exp(-t*k2),
                  np.ones_like(t)])

    return np.dot(u,w).flatten()

To solve for some test data, I create a test set where I set k1=0.03 and k2=0.003:

t=np.arange(1000)*0.5
test=func(t,0.03,0.004).reshape((3,1000))
test+=np.random.normal(size=test.shape)*0.01

which produces the following plot:

Plot of test

When I then try to fit values of k1 and k2, I get the following error:

popt,popc=optimize.curve_fit(func,t,test.flatten(),method='lm')

/usr/local/lib/python3.6/site-packages/ipykernel_launcher.py:4: RuntimeWarning: divide by zero encountered in double_scalars after removing the cwd from sys.path. /usr/local/lib/python3.6/site-packages/ipykernel_launcher.py:5: RuntimeWarning: divide by zero encountered in double_scalars """ /usr/local/lib/python3.6/site-packages/scipy/optimize/minpack.py:785: OptimizeWarning: Covariance of the parameters could not be estimated category=OptimizeWarning)

I understand that there is a divide by zero error somewhere here, but I am not sure where it is or how to solve it. So, my questions are:

  1. How to solve for this error in the curve_fit function?
  2. Is there a way to pass A_0 and B_0 into optimize.curve_fit, rather than manually entering the concentrations as I have done above? My understanding is that only the independent variable t and the unknowns k1 and k2 can be passed to the function?

Thank you for any help that can be provided

  • A) The default initial parameter values for curve_fit() are all 1.0, this might not be optimum for this problem. I use scipy.optimize.differential_evolution to supply these, and can provide an example if it would be of some use. B) Inside the func() you can add a try/except block, and upon exception print some values to track down the divide-by-zero, I have done this and sometimes it helps and sometimes it does not, though it has the advantage of being easy to code. – James Phillips Oct 12 '18 at 15:02
  • Hi James, thanks for your comment. A) If you could provide an example, that would be really helpful. B) I will give this a try and get back to you! thanks again – acleveronlinename Oct 15 '18 at 10:29

1 Answers1

0

Per the comments, here is an example of using scipy's differential_evolution genetic algorithm module for estimating initial parameters. This module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, and that algorithm requires bounds within which to search. In this example, those bounds are taken from the data minimum and maximum values. The fitting completes with a call to curve_fit() without passing bounds from the genetic algorithm search, in case the best parameters are outside those bounds.

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings

xData = numpy.array([19.1647, 18.0189, 16.9550, 15.7683, 14.7044, 13.6269, 12.6040, 11.4309, 10.2987, 9.23465, 8.18440, 7.89789, 7.62498, 7.36571, 7.01106, 6.71094, 6.46548, 6.27436, 6.16543, 6.05569, 5.91904, 5.78247, 5.53661, 4.85425, 4.29468, 3.74888, 3.16206, 2.58882, 1.93371, 1.52426, 1.14211, 0.719035, 0.377708, 0.0226971, -0.223181, -0.537231, -0.878491, -1.27484, -1.45266, -1.57583, -1.61717])
yData = numpy.array([0.644557, 0.641059, 0.637555, 0.634059, 0.634135, 0.631825, 0.631899, 0.627209, 0.622516, 0.617818, 0.616103, 0.613736, 0.610175, 0.606613, 0.605445, 0.603676, 0.604887, 0.600127, 0.604909, 0.588207, 0.581056, 0.576292, 0.566761, 0.555472, 0.545367, 0.538842, 0.529336, 0.518635, 0.506747, 0.499018, 0.491885, 0.484754, 0.475230, 0.464514, 0.454387, 0.444861, 0.437128, 0.415076, 0.401363, 0.390034, 0.378698])


def func(t, n_0, L, offset): #exponential curve fitting function
    return n_0*numpy.exp(-L*t) + offset


# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
    warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
    val = func(xData, *parameterTuple)
    return numpy.sum((yData - val) ** 2.0)


def generate_Initial_Parameters():
    # min and max used for bounds
    maxX = max(xData)
    minX = min(xData)
    maxY = max(yData)
    minY = min(yData)

    parameterBounds = []
    parameterBounds.append([minX, maxX]) # seach bounds for n_0
    parameterBounds.append([minX, maxX]) # seach bounds for L
    parameterBounds.append([0.0, maxY]) # seach bounds for Offset

    # "seed" the numpy random number generator for repeatable results
    result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
    return result.x

# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()

# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()

modelPredictions = func(xData, *fittedParameters) 

absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
James Phillips
  • 4,526
  • 3
  • 13
  • 11
  • 1
    Thanks James thats great! I also found that giving starting values manually i.e `popt,popc=optimize.curve_fit(func,t,test.flatten(),p0=[0.4,0.1],method='lm')` helped with the initial guess – acleveronlinename Oct 16 '18 at 08:54
  • What if there is an error message saying, Traceback (most recent call last): File "my_script.py", line 1310, in geneticParameters = generate_Initial_Parameters() File "my_script.py", line 1307, in generate_Initial_Parameters result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3) RuntimeError: The map-like callable must be of the form f(func, iterable), returning a sequence of numbers the same length as 'iterable' – Rebel Jun 22 '20 at 22:26