2

I have data with a clear exponential dependency. I tried to fit a curve through it with two different, very simple models. The first one is a straight forward exponential fit. For the second one, I log transformed the y values and then used a linear regression. To eventually plot the line, I raised my result to the power of e.

However, when plot both resulting regression lines, they look quite different. Also there r^2 value is quite different.

Can somebody explain to me why the fit is so different? I honestly thought the same curve should result from both models.

enter image description here

Below is the code that I used to generate the curves.

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
import math
from sklearn.metrics import r2_score


def exp(x, k):
    return np.exp(k * x)


def lin(x, m):
    return m * x


x = np.array([0.03553744809541667, 0.07393361944488888, 0.11713398354352941, 0.1574279442442857, 0.20574484316400002,
              0.24638269718399997, 0.28022173237600007, 0.33088392763600005, 0.37608523866, 0.4235348808,
              0.4698941935266667,
              0.5049780023645001, 0.53193232248, 0.59661874698, 0.64686695376, 0.6765964062965002, 0.7195010072795001,
              0.7624056082625001, 0.8053102092455002, 0.8696671107200001])

y = np.array([1.0, 0.9180040755624065, 0.7580780029008654, 0.662359339541471, 0.556415757973503, 0.4575163368602455,
              0.3982995279500034, 0.3309496816813175, 0.25142343840921577, 0.21526738042912116, 0.19490849614884595,
              0.12714651046365663, 0.12714651046365663, 0.1015770731180174, 0.0728982261567812, 0.04180399979351543,
              0.04180399979351543, 0.04180399979351543, 0.04180399979351543, 0.04180399979351543])

k_exp = curve_fit(exp, x, y)[0]
m_lin = curve_fit(lin, x, np.log(y))[0]
x_ticks = np.linspace(x.min(), x.max(), 100)

print("Exponential fit", r2_score(y, [exp(i, k_exp) for i in x])) #0.964
print("Log linear fit", r2_score(y, [np.exp(i * m_lin) for i in x])) #0.939

plt.scatter(x, y, c="k", s=5)
plt.plot(x_ticks, exp(x_ticks, k_exp), "r--", label="Exponential fit")
plt.plot(x_ticks, [np.exp(x * m_lin) for x in x_ticks], label="Log-linear fit")
plt.legend()

plt.show()
nhaus
  • 786
  • 3
  • 13

2 Answers2

4

One is:

exp(k * x) + err = y

other is:

m * x + err = log(y)

or:

exp(m*x + err) = y

so as you can see, the distribution of errors is different, therefore fit will be different.

Marcin
  • 1,321
  • 7
  • 15
  • Thanks, this makes a lot of sense. Is the equation how I calculate the goodness of fit (R^2) still right? Or do I have to adjust the way I calculate the expected y values – nhaus Nov 01 '20 at 15:28
  • not sure about r^2 but for loglinear you have to correct expected values, but I don't remember how this correction was called – Marcin Nov 01 '20 at 15:34
1

The minimizing problems of an exponential fit and a log-linear fit are a bit different. If you are fitting different things, you should also be prepared to obtain a different result.

In exponential fit, the differences

exp(k_exp, x) - y
array([-0.11232018, -0.13754469, -0.08285192, -0.07245726, -0.05473322,
       -0.01973325, -0.00746918, -0.00117105,  0.03198186,  0.02645663,
        0.01201946,  0.05681884,  0.04092329,  0.03372492,  0.04142677,
        0.06167547,  0.04781162,  0.03580521,  0.02540738,  0.01236329])

are minimized in least square sense

sum((exp(k_exp, x) - y)**2)
0.06488526426576267

In log-linear fit, the differences

m_lin * x - np.log(y)
array([-0.14034862, -0.20643379, -0.18563015, -0.20978567, -0.22631195,
       -0.19110049, -0.18613326, -0.20097633, -0.10466277, -0.13679878,
       -0.22053568,  0.06809742, -0.03835371, -0.06929866,  0.06400883,
        0.5026701 ,  0.33322626,  0.16378243, -0.00566141, -0.25982716])

are minimized in least square sense

sum((m_lin * x - np.log(y))**2)
0.8549505409763158

When looking the log-linear fit as though it would be exponential fit, the differences were

exp(m_lin, x) - y
array([-0.13094479, -0.17122601, -0.12843302, -0.12534621, -0.11269128,
       -0.07958516, -0.06764601, -0.06025541, -0.0249844 , -0.02752286,
       -0.03857453,  0.00895996, -0.00478421, -0.00680079,  0.0048187 ,
        0.02730342,  0.01653194,  0.00743936, -0.000236  , -0.00956539])

There are two differences

  • The log-linear fit shows higher sum of squares sum((exp(m_lin,x) - y)**2) = 0.11011945823779898 in a non-log-linear scale than the exponential fit (0.06488526426576267), and,
  • the errors exp(m_lin,x) - y of log-linear fit in non-log-linear scale are more far from zero with small values of x.

The values y

array([1.        , 0.91800408, 0.758078  , 0.66235934, 0.55641576,
       0.45751634, 0.39829953, 0.33094968, 0.25142344, 0.21526738,
       0.1949085 , 0.12714651, 0.12714651, 0.10157707, 0.07289823,
       0.041804  , 0.041804  , 0.041804  , 0.041804  , 0.041804  ])

are small in the whole range of x values, whereas values np.log(y)

array([ 0.        , -0.08555345, -0.27696899, -0.41194706, -0.5862395 ,
       -0.78194269, -0.92055097, -1.10578893, -1.38061676, -1.53587439,
       -1.63522508, -2.06241523, -2.06241523, -2.28693743, -2.61869097,
       -3.17476326, -3.17476326, -3.17476326, -3.17476326, -3.17476326])

are much higher in absolute sense for higher values of x

array([0.03553745, 0.07393362, 0.11713398, 0.15742794, 0.20574484,
       0.2463827 , 0.28022173, 0.33088393, 0.37608524, 0.42353488,
       0.46989419, 0.504978  , 0.53193232, 0.59661875, 0.64686695,
       0.67659641, 0.71950101, 0.76240561, 0.80531021, 0.86966711])

closer to value 1.

In this case, in exponential scale you are fitting on average remarkably smaller absolute values than in log-linear scale.

Heikki
  • 2,214
  • 19
  • 34