0

I have 2 linear models I have run in R

model_1_regression <- lm(model_1$ff4f_actual_excess_return_month1 ~ model_1$Rm.Rf + 
                         model_1$SMB + 
                         model_1$HML + 
                         model_1$MOM, 
                         na.action=na.exclude)

and

model_1_mom_1_regression <- lm(model_1_mom_1$ff4f_actual_excess_return_month1 ~ model_1_mom_1$Rm.Rf + 
                               model_1_mom_1$SMB + 
                               model_1_mom_1$HML + 
                               model_1_mom_1$MOM +
                               model_1_mom_1$mom_to_add,
                               na.action=na.omit)

and I would like to run a likelihood ratio test to see if the additional factor added is significant. How can I do this, and how can I interpret the results shown?

Many thanks

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
OSW
  • 13
  • 1
  • 1
  • 3
  • This could be useful: https://www.rdocumentation.org/packages/lmtest/versions/0.9-36/topics/lrtest or this: https://stats.stackexchange.com/questions/6505/likelihood-ratio-test-in-r – AntoniosK Aug 11 '18 at 19:37

2 Answers2

2

To compare nested models, you can use different criteria including p-value from LRT or ANOVA, Adjusted-R2, AIC, BIC and so on. LRT and ANOVA would yield the same outcome in terms of detecting a difference. Briefly, obtaining a p-value below the chosen significance level from these two tests indicates sufficient evidence in favor of rejecting the null hypothesis claiming the two models are equivalent. I recommend you search for additional information and the theory.

Your code includes different data (model_1 & model_1_mom_1) and NA treatments (na.exclude & na.omit), which makes me question whether your models are actually nested or not. Please make sure that you are fitting the models using the same data by supplying the same data set and NA treatment option to both functions. Then you can use anova:

# refactored your code and supplied the same data & na.action

reduced_model <- lm(formula = ff4f_actual_excess_return_month1 ~ 
                      Rm.Rf + 
                      SMB + 
                      HML + 
                      MOM,
                    data = df,
                    na.action=na.omit)

full_model <- lm(formula = ff4f_actual_excess_return_month1 ~ 
                   Rm.Rf + 
                   SMB + 
                   HML + 
                   MOM +
                   mom_to_add,
                 data = df,
                 na.action=na.omit)

# based on f-test
anova(reduced_model, full_model)

# based on chi-square test
anova(reduced_model, full_model, test = "LRT")

Note that there are other functions in R for running LRT such as lrtest but it computes the test statistic in a slightly different way. You can adopt one of them depending on your context. Please refer here for more information:
https://stats.stackexchange.com/questions/155474/r-why-does-lrtest-not-match-anovatest-lrt

OzanStats
  • 2,756
  • 1
  • 13
  • 26
0

Here is how the log-likelihood ratio test can be implemented with R for nested models:

set.seed(123)
# generated dummy data
n <- 1000
df <- data.frame(
    Rm.Rf = runif(n), 
    SMB = rnorm(n),
    HML = runif(n),
    MOM = rnorm(n),
    mom_to_add = runif(n)
)
df$ff4f_actual_excess_return_month1 <- 2*df$Rm.Rf - 36*df$SMB + 5*df$HML + 20*df$MOM + 0.5*df$mom_to_add + rnorm(n)

nested <- lm(formula = ff4f_actual_excess_return_month1 ~ 
                       Rm.Rf + SMB + HML + MOM,
                       data = df,
                       na.action=na.omit)
complex <- lm(formula = ff4f_actual_excess_return_month1 ~ 
                        Rm.Rf + SMB + HML + MOM + mom_to_add,
                 data = df,
                 na.action=na.omit)

teststat <- -2 * (as.numeric(logLik(nested))-as.numeric(logLik(complex)))
teststat
# [1] 5.656315
p.val <- pchisq(teststat, df = 1, lower.tail = FALSE)     # teststat ~ χ2(1) by Wilks theorem
p.val
# [1] 0.01739263

We can plot the results obtained above and notice that the null hypothesis H0 can be rejected, so that we can conclude that the complex model explains the data better.

alpha <- 0.05
x <- seq(0, 6, 0.01)
plot(x, dchisq(x, df=1), type='l', lwd=2, xlab='', ylab='')
abline(v = qchisq(1-alpha, df=1), col='red', lwd=2, lty=2)
points(teststat, 0, col='green', pch=19, cex=2)
legend('topright', c("χ2(1)", "α=0.05", "tstat"), 
        col = c('black', 'red', 'green'), 
        pch = c(NA, NA, 19), lwd = c(2, 2, NA), lty = c(1, 2, NA))

enter image description here

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63