0

I am running a simple multivariate regression on a panel/time-series dataset, using lm() and the underlying formula $(X'X)^{-1} X'Y$

expecting to get the same coefficient values from the two methods. However, I get completely different estimates.

Here is the R code:

  return = matrix(ret.ff.zoo, ncol = 50)  # y vector
  data = cbind(df$EQ, df$EFF, df$SIZE, df$MOM, df$MSCR, df$SY, df$UMP)   # x vector

  #First method     
  BETA = solve(crossprod(data)) %*% crossprod(data, return)

  #Second method
  OLS <- lm(return ~ data)

I am not sure why the estimates are different between the two methods..

Any help is appreciated! Thank you.

Mayou
  • 8,498
  • 16
  • 59
  • 98
  • Can't quite tell from your code (it would help if it were reproducible...), but do both models have an intercept? Also, you don't want to do this "by hand", except possibly for checking your understanding, as there's potential for major numerical issues. Use `lm`. – Aaron left Stack Overflow Sep 04 '13 at 18:24
  • Which one fits your data? Also, (apologies if you knew this) the `crossprod` function is not the vector crossproduct, so is it doing the function you want? – Carl Witthoft Sep 04 '13 at 18:25
  • `crossprod(X)` does X'X, so it does what it is intended to do in this context. – Mayou Sep 04 '13 at 18:26

1 Answers1

3

Your example isn't reproducible, but if you try it with some dummy data, the matrix formula and lm produce the same results when you take out the intercept:

set.seed(1)

x <- matrix(rnorm(1000),ncol=5)
y <- rnorm(200)

solve(t(x) %*% x) %*% t(x) %*% y
              [,1]
[1,] -0.0826496646
[2,] -0.0165735273
[3,] -0.0009412659
[4,]  0.0070475728
[5,] -0.0642452777
> lm(y ~ x + 0)

Call:
lm(formula = y ~ x + 0)

Coefficients:
        x1          x2          x3          x4          x5  
-0.0826497  -0.0165735  -0.0009413   0.0070476  -0.0642453  
eddi
  • 49,088
  • 6
  • 104
  • 155
Thomas
  • 43,637
  • 12
  • 109
  • 140
  • Your results are different because you use 2 different models (`lm` includes an intercept by default). – Joshua Ulrich Sep 04 '13 at 18:28
  • @Thomas I was not your downvote, but the example you posted clearly shows that the coefs from the two methods do not match. – Mayou Sep 04 '13 at 18:28
  • 1
    @JoshuaUlrich I just realized the stupid mistake I made, I should have included a vector of 1's in my matrix to take care of intercept! Thanks a lot! – Mayou Sep 04 '13 at 18:29