0

How can I run Multivariate Imputation by Chained Equations with mice() for this dataset, using rows 1:10, but predicting only for row #11?

library(mice)
library(car)

df = mtcars[c(1:10), c(3:5)]
df[c(1:3), c(1)] = NA
df[c(4:7), c(2)] = NA
df[c(8:10), c(3)] = NA

df[nrow(df) + 1, names(df)] <- NA
                   disp  hp drat
Mazda RX4            NA 110 3.90
Mazda RX4 Wag        NA 110 3.90
Datsun 710           NA  93 3.85
Hornet 4 Drive    258.0  NA 3.08
Hornet Sportabout 360.0  NA 3.15
Valiant           225.0  NA 2.76
Duster 360        360.0  NA 3.21
Merc 240D         146.7  62   NA
Merc 230          140.8  95   NA
Merc 280          167.6 123   NA
11                   NA  NA   NA
imp = mice(df, m = 10, seed = 52545, print = FALSE)

This code runs flawlessly, but mice() tries to predict all the NA's. I wouldn't like to spend resources to calculate those, I only need to predict row #11.

slamballais
  • 3,161
  • 3
  • 18
  • 29
sandoronodi
  • 315
  • 2
  • 12
  • Questions about how to code in R are off topic here. This should be on topic on [SO], so if you wait, we will try to migrate it there. – gung - Reinstate Monica Mar 14 '17 at 14:34
  • @gung sorry for the off post, I was thinking the same before. Can I help you anyhow to make the migration faster? – sandoronodi Mar 14 '17 at 14:47
  • No problem. To speed up the migration, you can click the faint gray "flag" below the tags & ask the moderators to migrate it for you. – gung - Reinstate Monica Mar 14 '17 at 14:49
  • 1
    if your only concern is the computational resources, then what you are trying to achieve simply doesn't make any sense. Because, the most computationally intensive parts of the multiple imputation process involve the model building phase, where for each column with missing values, a predictive model is built based on the remaining columns, using all available data. Simply replacing the missing values based on the models is trivial and will use a negligible amount of resources compared to the overall imputation procedure. – Ahmadov Mar 15 '17 at 01:42
  • @Ahmedov thanks for your concerns. At the moment I would be happy with ANY kind of computational improvement. – sandoronodi Mar 15 '17 at 08:32

1 Answers1

0

MICE will try to impute all of the NAs in the data i.e. it assumes all of the missing observations should be imputed.

Therefore what you can do is to to replace the NA data-points that you don't want to be imputed with other values.

The downside of this is that you can expect those values to appear as values in the missing data.

library(mice)
library(car)

df = mtcars[c(1:10), c(3:5)]
df[c(1:3), c(1)] = "not na"
df[c(4:7), c(2)] = "not na"
df[c(8:10), c(3)] = "not na"

df[nrow(df) + 1, names(df)] <- NA
                   disp  hp drat
Mazda RX4            NA 110 3.90
Mazda RX4 Wag        NA 110 3.90
Datsun 710           NA  93 3.85
Hornet 4 Drive    258.0  NA 3.08
Hornet Sportabout 360.0  NA 3.15
Valiant           225.0  NA 2.76
Duster 360        360.0  NA 3.21
Merc 240D         146.7  62   NA
Merc 230          140.8  95   NA
Merc 280          167.6 123   NA
11                   NA  NA   NA

imp = mice(df, m = 10, seed = 52545, print = FALSE)
G.D.
  • 170
  • 1
  • 8
  • Thanks for your answer! I think altering values like this can't be a solution here. Doing so would alter data-integrity. – sandoronodi Apr 18 '17 at 14:58
  • It will indeed I've just experienced it :) As a really really nasty patch you can iterate the process by inserting NA on the place alternated values and running MICE again until you are free from them. But I even feel bad about writing this. Please update here in case you finding a solution, it would be really useful. Thanks! – G.D. Apr 19 '17 at 08:51
  • Please see the following vignette (the 2nd one) http://www.gerkovink.com/miceVignettes/ You can simply let mice, not use certain variables to not be predicted (or not used as predictor). – Tom Jul 11 '19 at 06:20