2

I have a dataset with which I am conducting multivariate regressions with missing data, similar to this:

dat <- data.frame(y = c(1, 2, 3, 4, 5, 4, 3, 4, NA, 10),
                  x = c(6, 5, NA, 3, 9, 2, 6, 1, 7, 10),
                  z = c(4, 6, 4, 3, 1, 2, 1, 5, 10, NA))

I want to do pairwise deletion, whereby if there is NA for one of the independent variables the funciton still uses that observation to regress the values available. I have tried using the following options with na.action but I get the exact same output. Is there any way to do OLS regressions (or another kind) with pairwise as opposed to listwise deletion?

summary(lm(y ~ x + z, data = dat))
summary(lm(y ~ x + z, data = dat, na.action = "na.omit"))
summary(lm(y ~ x + z, data = dat, na.action = "na.exclude"))

On a side note, my understanding is that with listwise deletion the function only uses complete observations while pairwise deletion uses every case where there are two values in the same observation for the purpose of the regression. If I understand this wrong please do let me know. In essence, my problem is that I have a moderate amount of missing data for many varaibles and so my N goes from 450k to 170k obs. I am reluctant to use multiple imputation by chained equations (MICE) because this is a lot of data, it's multilevel data, and the mice package only has a function that can do 2-level MICE with one variable at a time.

Marco Pastor Mayo
  • 803
  • 11
  • 25
  • Try `dat[complete.cases(dat), ]` and compare `summary(lm(y ~ x + z, data = dat[complete.cases(dat), ]))` to `summary(lm(y ~ x + z, data = dat))` – `lm` just uses complete cases. – jay.sf Dec 15 '19 at 15:34
  • @jay.sf I have tried this, as you suggested, and it gives the exact same output as the other 3 versions of the function. If `lm()` only uses complete cases, is there any function that allows pairwise deletion with OLS. If there is a way to do pairwise deletion with `lmer` that would also be great. – Marco Pastor Mayo Dec 15 '19 at 15:40
  • 2
    You'll want to look at the [`regtools`](https://cran.r-project.org/web/packages/regtools/index.html) package; the function `lmac()` can do that (though look at the help file for its syntax -- it doesn't use the usual formula approach, you have to give it a matrix like `as.matrix(dat[ , c("x", "z", "y")])`). See [here](http://heather.cs.ucdavis.edu/Missing.pdf) for a little discussion – duckmayr Dec 15 '19 at 15:42

0 Answers0