-1

Suppose I run one of the missing variable imputation R packages, amelia or mice (or similar), on a large data frame -- let's say 100000 rows and 50 columns -- to get imputations for one particular column with some (let's say 200) NAs in it.

Is there a way to save the derived imputation algorithm so that when I get new data with 1000 new rows, I can simply apply the algorithm to that new data?

The goal is to impute any new NAs in the new data set using the same algorithm as the what was in the base data.

Thank you in advance -- if this isn't clear, I'm happy to answer any questions.

bioniclime
  • 47
  • 5
  • Hi there,please provide a minimal, reproducible, representative example(s) along with the desired end result. Use `dput()` for data and specify all non-base packages with library calls. Do not embed pictures for data or code, use indented code blocks. – NelsonGon Jan 21 '19 at 04:19
  • Why can't you just do the imputation each time, it's two lines of code? There's an `m`(mice) argument that can act like reproducibility. – NelsonGon Jan 21 '19 at 04:21
  • 1
    @NelsonGon Sorry I wasn't clear. Yes, I certainly could do the imputation all over again, but in the "new data" case, I want the "compiled" imputation to be super-fast. Almost like I want a "predict" statement for the imputation... – bioniclime Jan 21 '19 at 16:27

1 Answers1

0

caret comes close to what you want: This assumes all new data takes on the same variables. Imputation(s) by caret and mice however do have different accuracies(in my experience).

library(caret)
mydata<-data.frame(A=c(rep(NA,900),rep(3,900)),B=c(rep(NA,200),rep(3,400)))
mydata1<-data.frame(D=mydata,E=rep(mydata))
prep<-preProcess(mydata,method = "medianImpute")
df_new<-predict(prep,mydata)
df_new
df_new2<-predict(prep,mydata1)
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
  • 1
    Yes, this is pretty much what I'm looking for. I wonder, however, if there's a way to "save" the imputation algorithm in a compiled-like process, rather than having to load "mydata" into memory? Any ideas? – bioniclime Jan 21 '19 at 16:34
  • Sorry could you elaborate more on how a compiled-like process could work? – NelsonGon Jan 21 '19 at 16:38
  • I can't see how. Most of these imputation packages rely on saving the imputed values in memory. Maybe allocate the object temporarily in the environment? – NelsonGon Jan 21 '19 at 16:45