I have a dataset with which I am conducting multivariate regressions with missing data, similar to this:
dat <- data.frame(y = c(1, 2, 3, 4, 5, 4, 3, 4, NA, 10),
x = c(6, 5, NA, 3, 9, 2, 6, 1, 7, 10),
z = c(4, 6, 4, 3, 1, 2, 1, 5, 10, NA))
I want to do pairwise deletion, whereby if there is NA
for one of the independent variables the funciton still uses that observation to regress the values available. I have tried using the following options with na.action
but I get the exact same output. Is there any way to do OLS regressions (or another kind) with pairwise as opposed to listwise deletion?
summary(lm(y ~ x + z, data = dat))
summary(lm(y ~ x + z, data = dat, na.action = "na.omit"))
summary(lm(y ~ x + z, data = dat, na.action = "na.exclude"))
On a side note, my understanding is that with listwise deletion the function only uses complete observations while pairwise deletion uses every case where there are two values in the same observation for the purpose of the regression. If I understand this wrong please do let me know. In essence, my problem is that I have a moderate amount of missing data for many varaibles and so my N
goes from 450k to 170k obs. I am reluctant to use multiple imputation by chained equations (MICE) because this is a lot of data, it's multilevel data, and the mice
package only has a function that can do 2-level MICE with one variable at a time.