4

If I have a data frame contain 3 variables :

origdata <- data.frame(
  age <- c(22, 45, 50, 80, 55, 45, 60, 24,   18, 15),
  bmi <- c(22, 24, 26, 27, 28, 30, 27, 25.5, 18, 25),
  hyp <- c(1,  2,  4,  3,  1,  2,  1,  5,    4,  5) )

I created MCAR (missing complete at random) data :

halpha <- 0.1

# MCAR for attribute (1) age:
mcar <- runif(10, min = 0, max = 1)  
age.mcar <- ifelse(mcar < alpha, NA, origdata$age)  

# MCAR for attribute (2) bmi: 
mcar <- runif(10, min = 0, max = 1) 
bmi.mcar <- ifelse(mcar < alpha, NA, origdata$bmi)  

# MCAR for attribute (3) hyp: 
mcar <- runif(10, min = 0, max = 1) 
hyp.mcar <- ifelse(mcar < alpha, NA, origdata$hyp)  

After that I used the mice package to impute the missing value as follows:

install.packages("mice")
library("mice")
imp <- mice(df, 10)              # 10 is mean 10 iteration imputing data 
fill1 <- complete(imp, 1)        # dataset 1
fill2 <- complete(imp, 2)        # dataset 2
allfill <- complete(imp, "long") # all iterations together 

My question is: I want to find RMSE for all 10 datasets individually by using a loop. This is my RMSE equation :

RMSE <- sqrt((sum((origdata - fill)^2)) / sum(is.na(df)))

I mean to make a loop to find the RMSE for each imputed dataset individually:
RMSE1 (for dataset #1)
RMSE2 (for dataset #2)
...
RMSE10 (for dataset #10)

And I also want to know which dataset is best for impute NAs.

slamballais
  • 3,161
  • 3
  • 18
  • 29
zhyan
  • 261
  • 4
  • 14
  • 1
    Why would the RMSE be informative about which run of imputation is "best". Seems as though that would be a classic case of "begging the question". I think you need statistical advice more than you need programming assistance. – IRTFM Dec 22 '15 at 20:56
  • 1
    Typically if one is doing multiple imputation, model-averaging is used after running analyses on each imputed dataset. One typically does not choose a "best" imputation in the manner you have described. – alexwhitworth Dec 22 '15 at 21:06
  • ok you think RMSE is not best tool to check that which imputation is best . give me advice which are new tools for know that which imputation is best . – zhyan Dec 22 '15 at 21:08
  • 1
    @Alex did you mean by this way `fit <- with(imp2, lm(ch1.mcar~age.mcar+bmi.mcar)) pool(fit) summary(pool(fit))` – zhyan Dec 22 '15 at 21:18

1 Answers1

4

loop in R:

m <- imp$m  # number of imputations

RSME <- rep(NA, m)
for (i in seq_len(m)) {
  fill <- complete(imp, i)
  RMSE[i] <- (sqrt((sum((orgdata - fill)^2))/sum(is.na(x))))
}
slamballais
  • 3,161
  • 3
  • 18
  • 29
zhyan
  • 261
  • 4
  • 14