0

I have the data below and want to do a exponential regression model using lm and the log of variable two.

When I evaluate the model, I get two different r2 from the summary of the model and when i do a summary of the call of the model. Why do I get this d

  data <- structure(list(V1 = c(0.79, 0.61, 0.83, 0.86, 0.84, 0.78, 0.8, 
                      0.81, 0.77, 0.83, 0.8, 0.86, 0.31, 0.8, 0.85, 0.77, 0.77, 0.86, 
                      0.66, 0.81, 0.84, 0.68, 0.81, 0.81, 0.75, 0.64, 0.83, 0.52, 0.85, 
                      0.5), V2 = c(832.69, 411.64, 1150.85, 1236, 751.09, 723.46, 1056.16, 
                                   904.22, 361.76, 695.04, 948.45, 812.51, 75.52, 700.64, 1193.39, 
                                   523.02, 1713.68, 1183.73, 320.96, 678.42, 825.22, 159.17, 891.43, 
                                   177.52, 863.89, 217.45, 552.3, 223.9, 564.05, 99.26)), row.names = c(41L, 
                                                                                                        25L, 74L, 40L, 130L, 118L, 109L, 83L, 77L, 16L, 49L, 86L, 23L, 
                                                                                                        13L, 45L, 3L, 15L, 37L, 31L, 14L, 5L, 85L, 103L, 36L, 126L, 38L, 
                                                                                                        30L, 54L, 95L, 81L), class = "data.frame")

fit <- lm(formula = log(data$V2) ~ data$V1)
fit
plot(data)
lines(sort(data$V1), exp(sort(predict(fit, list(x =data$V1)))), col="red")
points(sort(data$V1), exp(sort(predict(fit, list(x =data$V1)))), col="red")
summary(fit)

Adjusted R-squared: 0.64

data$V2predicted <- exp(predict(fit,list(x =data$V1)))
points(data$V1, data$V2predicted, col = 'blue')


summary(lm(data$V2 ~ data$V2predicted))

Adjusted R-squared: 0.4166

This is not about the difference between multiple R^2 and adjusted R^2, but about why I get different R^2 from the model call and from lm().

I am I doing something wrong?

mace
  • 490
  • 1
  • 7
  • 24

1 Answers1

1

you get different rsq values because your response variables are on different scales. For your first fit lm(formula = log(data$V2) ~ data$V1), your response variables are in the log scale. In the second you convert them back to exponential

R squared is the variance explained by the model (see MSS below) as a fraction of the total variance (squared sum of residuals RSS + MSS):

Define r-square function:

calculate_rsq = function(fit){

fitted_values = fit$fitted.values 
MSS = sum((fitted_values-mean(fitted_values))^2)
RSS = sum(fit$residuals^2)
TSS = MSS+RSS

rsq = 1 - RSS/TSS

c(RSS=RSS,TSS=TSS,rsq=rsq)
}

Check the two models you have:

fit_log <- lm(formula = log(data$V2) ~ data$V1)
data$V2predicted <- exp(predict(fit,list(x =data$V1)))
fit_exp <- lm(data$V2 ~ data$V2predicted)

And you see why the R-square is different:

calculate_rsq(fit_log)
       RSS        TSS        rsq 
 6.1929518 17.8160346  0.6523945 
 calculate_rsq(fit_exp)
         RSS          TSS          rsq 
2.549842e+06 4.526867e+06 4.367315e-01 

So if you fit the model using log of your response, then you report the R^2 for that fit.

StupidWolf
  • 45,075
  • 17
  • 40
  • 72