0

I got a few questions, I couldn't really find anything on with the documentation unless I'm missing something or don't understand imputation process/logic.

Basically the most important is that since sometimes the 'imputed' values are different, I'd like to take the average - if it is numeric - or mode if it is a categorical value.

All the examples that I see showing "complete(miced_model, 1)". If I'm running the mice model with 5 or 10 different iterations I don't see the point in just picking 1. I'd like the average of all of them.

Can anyone show me how to do this?

set.seed(2016)
library(mice)
nhanes # this is the dataset
nhanes[5,1]=NA  # setting up some categorical examples
nhanes[1,1]=NA
nhanes$age = as.factor(nhanes$age)
imputed_values = mice(nhanes, m = 5, method='rf',maxit = 3)
new_nhanes = complete(imputed_values, 'long') # or repeated? or what?

new_hanes_fixed =   # new data frame with averaged values imputed rather than just arbitrary '1st' iteration?

THANKS!!

runningbirds
  • 6,235
  • 13
  • 55
  • 94
  • 1
    You want the `mean` or `median` by `.id` (you should also calculate the `sd` or `IQR`). That's a standard aggregate by group question and has been answered *ad nauseam*. – Roland Dec 19 '16 at 07:28
  • Not sure if this is helpful, but have you looked at the simputation package? – lawyeR Dec 19 '16 at 12:49
  • 1
    This is not how you shoud do MI. Normally, you're supposed to analyse each data set separately, and then you pool the estimates of these analyses according to the rules proposed by Rubin (1987) or similar using the functions implemented in several R packages (`mice`, `mitools`, `mitml`). If you average imputations, you loose the benefits of conducting MI in the first place. In fact, averaging imputation tends to be worse than just using a single imputation! – SimonG Jan 13 '17 at 17:28

2 Answers2

0

You should look at the comment of SimonG

You are completely on the wrong track. The whole point of multiple imputation is that you have different imputed datasets. (on which you would perform your analysis)

If you don't need multiple imputation you can directly use single imputation methods.( for example kNN or imri function from the VIM package)

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55
0

It sounds like you want to pool your results of your analysis, that way you run your analysis on every imputed data set. Read more here on Pooling Data: https://www.r-bloggers.com/imputing-missing-data-with-r-mice-package/

wissem
  • 58
  • 8