2

I am trying to fit multiple curve equations to my data to determine what kind of decay curve best represents my data. I am using the curve_fit function within scipy in python.

Here is my example data:

data = {'X':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12 ,13 ,14 ,15
],
        'Y':[55, 55, 55, 54, 54, 54, 54, 53, 53, 50, 45, 37, 27, 16, 0
]}
df = pd.DataFrame(data)

I then want to try fitting a linear, logarithmic, second-degree polynomial, and exponential decay curve to my data points and then plotting. I am trying out the following code:

import pandas as pd
import numpy as np
from numpy import array, exp
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# load the dataset
data = df.values
# choose the input and output variables
x, y = data[:, 0], data[:, 1]

def func1(x, a, b):
    return a * x + b

def func2(x, a, b):
    return a * np.log(x) + b

def func3(x, a, b, c):
    return a*x**2+b*x+c

def func4(x, a, b, c):
    return a*exp(b*x)+c

params, covs = curve_fit(func1, x, y)

params, _ = curve_fit(func1, x, y)
a, b = params[0], params[1]
yfit1 = a * x + b
print('Linear decay fit:')
print('y = %.5f * x + %.5f' % (a, b))

params, _  = curve_fit(func2, x, y)
a, b = params[0], params[1]
yfit2 = a * np.log(x) + b
print('Logarithmic decay fit:')
print('y = %.5f * ln(x)+ %.5f' % (a, b))

params, _ = curve_fit(func3, x, y)
a, b, c = params[0], params[1], params[2]
yfit3 = a*x**2+b*x+c
print('Polynomial decay fit:')
print('y = %.5f * x^2 + %.5f * x + %.5f' % (a, b, c))

params, _  = curve_fit(func4, x, y)
a, b, c = params[0], params[1], params[2]
yfit4 = a*exp(x*b)+c
print('Exponential decay fit:')
print('y = %.5f * exp(x*%.5f)+%.5f' % (a, b, c))

plt.plot(x, y, 'bo', label="y-original")
plt.plot(x, yfit1, label="y=a * x + b")
plt.plot(x, yfit2, label="y=a * np.log(x) + b")
plt.plot(x, yfit3, label="y=a*x^2 + bx + c")
plt.plot(x, yfit4, label="y=a*exp(x*b)+c")
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='best', fancybox=True, shadow=True)
plt.grid(True)
plt.show()

This produces the following text and visual plotting output: enter image description here

I had originally tried to find a way to show the r-squared value for each curve fit, but I found out that for non-linear curves, R squared is not suitable, and that instead I should be identifying the standard error for each curve fit to best judge which curve equation best describes the decay I am seeing in my datapoints. My question is how can I take my code and the fits of each equation I attempted and output a standard error reading for each curve fit so that I can best judge which equation most accurately represents my data? I am ultimately trying to make the judgement call of saying "I have discovered that as the x-axis value increases, the y-axis value decays ..." and fitting in that blank with "linearly", "logarithmically", second-degree quadratically", or "exponentially". Visually I see that the exponentially decay equation fits my data the best, but I do not have a quantitative basis to make that call, other than simply visually, and so that is why I am trying to find out how I can print the standard error, so that I can judge the best fit for an equation as corresponding to the equation with the lowest standard error. I have been unable to locate just how to derive this from my calculations from documentation.

  • A simple check for the [goodness of fit](https://en.wikipedia.org/wiki/Goodness_of_fit) is the [reduced chi-square](https://en.wikipedia.org/wiki/Reduced_chi-squared_statistic) – mikuszefski Mar 11 '22 at 08:08

1 Answers1

0

You can use Mean Absolute Scaled Error (MASE) for comparing the goodness of fit. Define a function for MASE as follows:

import numpy as np

def mase(actual : np.ndarray, predicted : np.ndarray):

    forecast_error = np.mean(np.abs(actual - predicted))
    naive_forecast = np.mean(np.abs(np.diff(actual)))
    mase = forecast_error / naive_forecast

    return mase

Then simply compare all of the predicted values with the actual values (both must be NumPy arrays to use the function above). You can then select the model/equation with the lowest MASE value. For example:

>>> actual = np.array([1,2,3])
>>> predicted1 = np.array([1.1, 2.2, 3.3])
>>> mase(actual, predicted1)
0.20000000000000004
>>>
>>> predicted2 = np.array([1.1, 2.1, 3.1])
>>> mase(actual, predicted2)
0.10000000000000009

Here we can safely say that predicted2 has a lower MASE value and thus is favorable.

Muhammad Mohsin Khan
  • 1,444
  • 7
  • 16
  • 23