4

I have the following data points that I would like to curve fit:

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

t = np.array([15474.6, 15475.6, 15476.6, 15477.6, 15478.6, 15479.6, 15480.6,
              15481.6, 15482.6, 15483.6, 15484.6, 15485.6, 15486.6, 15487.6,
              15488.6, 15489.6, 15490.6, 15491.6, 15492.6, 15493.6, 15494.6,
              15495.6, 15496.6, 15497.6, 15498.6, 15499.6, 15500.6, 15501.6,
              15502.6, 15503.6, 15504.6, 15505.6, 15506.6, 15507.6, 15508.6,
              15509.6, 15510.6, 15511.6, 15512.6, 15513.6])

v = np.array([4.082, 4.133, 4.136, 4.138, 4.139, 4.14, 4.141, 4.142, 4.143,
              4.144, 4.144, 4.145, 4.145, 4.147, 4.146, 4.147, 4.148, 4.148,
              4.149, 4.149, 4.149, 4.15, 4.15, 4.15, 4.151, 4.151, 4.152,
              4.152, 4.152, 4.153, 4.153, 4.153, 4.153, 4.154, 4.154, 4.154,
              4.154, 4.154, 4.155, 4.155])

The exponential function that I want to fit to the data is:

function

The Python function representing the above formula and the associated curve fit with the data is detailed below:

def func(t, a, b, alpha):
    return a - b * np.exp(-alpha * t)


# scale vector to start at zero otherwise exponent is too large
t_scale = t - t[0]

# initial guess for curve fit coefficients
a0 = v[-1]
b0 = v[0]
alpha0 = 1/t_scale[-1]

# coefficients and curve fit for curve
popt4, pcov4 = curve_fit(func, t_scale, v, p0=(a0, b0, alpha0))

a, b, alpha = popt4
v_fit = func(t_scale, a, b, alpha)

ss_res = np.sum((v - v_fit) ** 2)       # residual sum of squares
ss_tot = np.sum((v - np.mean(v)) ** 2)  # total sum of squares
r2 = 1 - (ss_res / ss_tot)              # R squared fit, R^2

The data compared to the curve fit is plotted below. The parameters and R squared value are also provided.

figure

a0 = 4.1550   b0 = 4.0820   alpha0 = 0.0256
a = 4.1490    b = 0.0645    alpha = 0.9246
R² = 0.8473

Is it possible to get a better fit with the data using the approach outlined above or do I need to use a different form of the exponential equation?

I'm also not sure what to use for the initial values (a0, b0, alpha0). In the example, I chose points from the data but that may not be the best method. Any suggestions on what to use for the initial guess for the curve fit coefficients?

wigging
  • 8,492
  • 12
  • 75
  • 117
  • I would try 1. not fitting through the peak and 2. adding weights to some of the points that will drag the fit down. You can do with the `sigma` kwarg for `curve_fit` – dan_g Apr 26 '18 at 19:05
  • @user3014097 I was not aware of the `sigma` keyword. How do I apply it to my example? Also, starting the curve fit at the peak (`t[1:]` and `v[1:]`) works well but how much will that affect the `popt` values? – wigging Apr 26 '18 at 19:15
  • The equation plot is flat on the right side, while the data shows a clear upward slope. I do not think this one equation will fit that part of the data well. – James Phillips Apr 26 '18 at 20:24

3 Answers3

6

To me this looks like something that is better fit by multiple components, rather than a single exponential.

def func(t, a, b, c, d, e):
    return a*np.exp(-t/b) + c*np.exp(-t/d) + e


# scale vector to start at zero otherwise exponent is too large
t_scale = t - t[0]

# initial guess for curve fit coefficients
guess = [1, 1, 1, 1, 0]

# coefficients and curve fit for curve
popt, pcov = curve_fit(func, t_scale, v, p0=guess)

v_fit = func(t_scale, *popt)

enter image description here

dan_g
  • 2,712
  • 5
  • 25
  • 44
  • 1
    The equation you use provides a much better fit. However, the solution seems to be sensitive to the initial guess. For example, the a, c, and e initial values can in terms of v[-1], v[0], and v[-1]. Is there a method that isn't so sensitive to the initial values? – wigging Apr 26 '18 at 22:07
  • To make the initial values similar to my original problem, it looks like the initial guess for `a`, `c`, and `e` in your example should be the value from `v[-1]`. Does this seem reasonable? – wigging Apr 27 '18 at 14:12
  • I'm not sure what your initial question is. Think of what the equation would be at t = 0. Essentially it should be the sum of the minimums of your two components plus your offset (E). As long as your guess is of the right magnitude you should be fine. Normally when I'm doing this kind of fitting I prefer to zero both time (which you did) and my baseline (so, in your scenario `v_scale = v - v.max()`), so that you're decaying to 0. Again, though, as long as you're in the ballpark for the guess you should be fine. If v varies widely I'd work on normalizing it before fitting – dan_g Apr 27 '18 at 14:47
  • Another solution would be to transform v to be linear, which would make the fitting step simpler. – dan_g Apr 27 '18 at 14:50
  • Are you referring to linear regression where you would take the log of the equation and fit to that form? Can you provide an example of this? Thanks for all the help. – wigging Apr 27 '18 at 15:17
4

If you remove the first data point, you will get a much better fit.

Using lmfit (https://lmfit.github.io/lmfit-py), which provides a higher-level and easier to use interface to curve-fitting, your fitting script would look like this:

import matplotlib.pyplot as plt
import numpy as np
from lmfit import Model

t = np.array([15474.6, 15475.6, 15476.6, 15477.6, 15478.6, 15479.6, 15480.6,
              15481.6, 15482.6, 15483.6, 15484.6, 15485.6, 15486.6, 15487.6,
              15488.6, 15489.6, 15490.6, 15491.6, 15492.6, 15493.6, 15494.6,
              15495.6, 15496.6, 15497.6, 15498.6, 15499.6, 15500.6, 15501.6,
              15502.6, 15503.6, 15504.6, 15505.6, 15506.6, 15507.6, 15508.6,
              15509.6, 15510.6, 15511.6, 15512.6, 15513.6])

v = np.array([4.082, 4.133, 4.136, 4.138, 4.139, 4.14, 4.141, 4.142, 4.143,
              4.144, 4.144, 4.145, 4.145, 4.147, 4.146, 4.147, 4.148, 4.148,
              4.149, 4.149, 4.149, 4.15, 4.15, 4.15, 4.151, 4.151, 4.152,
              4.152, 4.152, 4.153, 4.153, 4.153, 4.153, 4.154, 4.154, 4.154,
              4.154, 4.154, 4.155, 4.155])

def func(t, a, b, alpha):
    return a + b * np.exp(-alpha * t)

# remove first data point, take offset from t
tx = t[1:] - t[0]
vx = v[1:]

# turn your model function into a Model
amodel = Model(func)
# create parameters with initial values.  Note that parameters
# are named from the arguments of your model function.
params = amodel.make_params(a=v[0], b=0, alpha=1.0/(t[-1]-t[0]))

# fit the data to the model with the parameters
result = amodel.fit(vx, params, t=tx)

# print the fit statistics and resulting parameters
print(result.fit_report())

# plot data and fit
plt.plot(t, v, 'o', label='data')
plt.plot(t, result.eval(result.params, t=(t-t[0])), '--', label='fit')
plt.legend()
plt.show()

This will print out these results

[[Model]]
    Model(func)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 44
    # data points      = 39
    # variables        = 3
    chi-square         = 1.1389e-05
    reduced chi-square = 3.1635e-07
    Akaike info crit   = -580.811568
    Bayesian info crit = -575.820883
[[Variables]]
    a:      4.15668660 +/- 5.0662e-04 (0.01%) (init = 4.082)
    b:     -0.02312772 +/- 4.1930e-04 (1.81%) (init = 0)
    alpha:  0.06004740 +/- 0.00360126 (6.00%) (init = 0.02564103)
[[Correlations]] (unreported correlations are < 0.100)
    C(a, alpha) = -0.945
    C(a, b)     = -0.682
    C(b, alpha) =  0.465

and show this plot for the fit:

enter image description here

M Newville
  • 7,486
  • 2
  • 16
  • 29
  • For purpose of comparison, would you please re-plot using the same graph scaling as the graph used by the OP in their question? – James Phillips Apr 26 '18 at 22:13
  • I don't understand why someone downvoted this answer. "Remove the first point" was the first thing I thought of when I saw the graph in the question. This answer saved me the trouble of finding out how well it would work--pretty well, it turns out! – Warren Weckesser Apr 26 '18 at 23:45
  • @JamesPhillips that's an excellent suggestion. I updated the plot to include the first point. That really illustrates how much of an outlier that point is. – M Newville Apr 27 '18 at 02:18
  • If there were more data between the first and second points it would be a different story, but that is not the case here. I would recommend taking additional data in this range if at all possible. Lacking that extra data, you have a strong case to make for removal of that point due to your regression analysis. Thank you for the updated plot. – James Phillips Apr 27 '18 at 03:05
  • By removing the initial point, how much would that affect the values for the variables `a`, `b`, and `alpha`? – wigging Apr 27 '18 at 14:17
  • 1
    @wigging To me the first point looks like an outlier. You can easily check the result of including it using `t[:]` and `v[:]` instead `t[1:]` and `v[1:]`. Chi-square will increase by a factor or ~75x (a much worse fit), `a` will be about 4.1, `b` will be about -0.064, and `alpha` will be about 0.92. The fit will look about as bad as in your original question. As the other answers here point out, the data with the first point does not match your model well -- they propose other models that work better. Ignoring the first point, your data can match your model. – M Newville Apr 27 '18 at 16:49
3

The best single 3-parameter equation I could find, R-squared = 0.9952, was an x-shifted power function:

y = pow((a + x), b) + Offset

with parameters:

a = -1.5474599569484271E+04
b =  6.3963649365056151E-03
Offset =  3.1303674911990789E+00

model

James Phillips
  • 4,526
  • 3
  • 13
  • 11