1

I've noticed that when I make a power law model in R, it almost always under-predicts compared to one made in Python. Here is an R example:

#make data
x <- seq(1,10,1)
y <- c(1,2.1,3.5, 4.5, 4.8, 6, 9, 14, 20, 24.75)

model.pl <- lm(log(y) ~ log(x))
x_axis <- seq(1,10,1)
lines(x_axis, exp(predict(model.pl, data.frame(x=x_axis))), col = 'red', lty = 2, lwd=2)
summary(model.pl)

print(exp(model.pl$coefficients[1])) # 0.8093913 
print(model.pl$coefficients[2]) # 1.325091 

And in Python using Scipy:

lst = list(range(1,11,1))
lst2 = [1,2.1,3.5, 4.5, 4.8, 6, 9, 14, 20, 24.75]

df = pd.DataFrame(list(zip(lst, lst2)),
               columns =['x', 'y'])

def objective(x, a, b):
    return a * x ** b

x, y = df["x"], df["y"]

popt, _ = curve_fit(objective, x, y)

a, b = popt
print(a) #0.10104993 
print(b) #2.38573123 

So I end up with quite different parameters, which translates to different models when plotted below. Is this to be expected? I assumed they would fit essentially identical models:

Python model (left), R model (right)

As per phipsgabler, I modified the call, but this does not seem to change the model all that much:

model.pl <- lm(log(y) ~ log(x)-1)
x_axis <- seq(1,10,1)
lines(x_axis, exp(predict(model.pl, data.frame(x=x_axis))), col = 'red', lty = 2, lwd=2)
summary(model.pl)
print(exp(model.pl$coefficients[1]))
print(model.pl$coefficients[2])
khelwood
  • 55,782
  • 14
  • 81
  • 108
hmnoidk
  • 545
  • 6
  • 20
  • I think R includes an intercept by default. Could you try to compare with `log(y) ~ log(x) - 1`? – phipsgabler Aug 25 '22 at 08:23
  • This doesn't change the trajectory of the model that much. Also, I'm not sure how i would back transform the model so I can write it out the formula without the log transforms. – hmnoidk Aug 25 '22 at 08:30
  • 2
    https://stats.stackexchange.com/questions/254689/linear-regression-with-log-transformed-data-large-error/254706#254706 – Roland Aug 25 '22 at 08:31
  • 2
    https://stats.stackexchange.com/questions/254876/regression-models-for-log-transformed-data-without-multiplicative-error/255265#255265 – Roland Aug 25 '22 at 08:33
  • Ah okay, your second thread seems to have helped me Roland. So basically, the `curve_fit` function in Python includes the error term, if I'm understanding correctly? – hmnoidk Aug 25 '22 at 11:13
  • A regression model always includes an error term. The point is that one of your models includes a multiplicative and the other an additive error term. – Roland Aug 25 '22 at 11:17
  • How can I extract the residuals to do RMSE from the NLS version of the model? Or are these assumed to be the same as that of the model that was fed to the NLS version? – hmnoidk Aug 25 '22 at 12:24
  • Either fit an OLS log-log model with Python or fit a non-linear regression model with R. You have no chance of getting the same results if you use two statistical models with different assumptions. So, you first need to make a choice regarding statistical methodology (see the links above) before you work on the implementation. – Roland Aug 25 '22 at 13:45

0 Answers0