I am trying to fit multiple curve equations to my data to determine what kind of decay curve best represents my data. I am using the curve_fit
function within scipy
in python.
Here is my example data:
data = {'X':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12 ,13 ,14 ,15
],
'Y':[55, 55, 55, 54, 54, 54, 54, 53, 53, 50, 45, 37, 27, 16, 0
]}
df = pd.DataFrame(data)
I then want to try fitting a linear, logarithmic, second-degree polynomial, and exponential decay curve to my data points and then plotting. I am trying out the following code:
import pandas as pd
import numpy as np
from numpy import array, exp
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
# load the dataset
data = df.values
# choose the input and output variables
x, y = data[:, 0], data[:, 1]
def func1(x, a, b):
return a * x + b
def func2(x, a, b):
return a * np.log(x) + b
def func3(x, a, b, c):
return a*x**2+b*x+c
def func4(x, a, b, c):
return a*exp(b*x)+c
params, covs = curve_fit(func1, x, y)
params, _ = curve_fit(func1, x, y)
a, b = params[0], params[1]
yfit1 = a * x + b
print('Linear decay fit:')
print('y = %.5f * x + %.5f' % (a, b))
params, _ = curve_fit(func2, x, y)
a, b = params[0], params[1]
yfit2 = a * np.log(x) + b
print('Logarithmic decay fit:')
print('y = %.5f * ln(x)+ %.5f' % (a, b))
params, _ = curve_fit(func3, x, y)
a, b, c = params[0], params[1], params[2]
yfit3 = a*x**2+b*x+c
print('Polynomial decay fit:')
print('y = %.5f * x^2 + %.5f * x + %.5f' % (a, b, c))
params, _ = curve_fit(func4, x, y)
a, b, c = params[0], params[1], params[2]
yfit4 = a*exp(x*b)+c
print('Exponential decay fit:')
print('y = %.5f * exp(x*%.5f)+%.5f' % (a, b, c))
plt.plot(x, y, 'bo', label="y-original")
plt.plot(x, yfit1, label="y=a * x + b")
plt.plot(x, yfit2, label="y=a * np.log(x) + b")
plt.plot(x, yfit3, label="y=a*x^2 + bx + c")
plt.plot(x, yfit4, label="y=a*exp(x*b)+c")
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='best', fancybox=True, shadow=True)
plt.grid(True)
plt.show()
This produces the following text and visual plotting output:
I had originally tried to find a way to show the r-squared value for each curve fit, but I found out that for non-linear curves, R squared is not suitable, and that instead I should be identifying the standard error for each curve fit to best judge which curve equation best describes the decay I am seeing in my datapoints. My question is how can I take my code and the fits of each equation I attempted and output a standard error reading for each curve fit so that I can best judge which equation most accurately represents my data? I am ultimately trying to make the judgement call of saying "I have discovered that as the x-axis value increases, the y-axis value decays ..." and fitting in that blank with "linearly", "logarithmically", second-degree quadratically", or "exponentially". Visually I see that the exponentially decay equation fits my data the best, but I do not have a quantitative basis to make that call, other than simply visually, and so that is why I am trying to find out how I can print the standard error, so that I can judge the best fit for an equation as corresponding to the equation with the lowest standard error. I have been unable to locate just how to derive this from my calculations from documentation.