8

I have a vector Y containing future returns and a vector X contain current returns. The last Y element is NA, as the last current return is also the very end of the available series.

X = { 0.1, 0.3, 0.2, 0.5 }
Y = { 0.3, 0.2, 0.5, NA }
Other = { 5500, 222, 523, 3677 }

lm(Y ~ X + Other)

I want to make sure that the last element of each series is not included in the regression. I read the na.action documentation but I'm not clear if this is the default behaviour.

For cor(), is this the correct solution to exclude X[4] and Y[4] from the calculation?

cor(X, Y, use = "pairwise.complete.obs")
Braiam
  • 1
  • 11
  • 47
  • 78
Robert Kubrick
  • 8,413
  • 13
  • 59
  • 91

1 Answers1

14

The factory-fresh default for lm is to disregard observations containing NA values. Since this could be overridden using global options, you might want to explicitly set na.action to na.omit:

> summary(lm(Y ~ X + Other, na.action=na.omit))

Call:
lm(formula = Y ~ X + Other, na.action = na.omit)

[snip]

  (1 observation deleted due to missingness)
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

As to your second question cor(X,Y,use='pairwise.complete.obs') is correct. Since there are only two variables, cor(X,Y,use='complete.obs') would also produce the expected result.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 3
    You might want to clarify the reason behind your final sentence: with only two vectors being correlated, `pairwise.complete.obs` and `complete.obs` are equivalent. With more vectors (i.e. taking the correlations of all the columns in a matrix), they wouldn't be ... – Ben Bolker Dec 09 '11 at 16:20