averaging imputation of missing values

Question

I got a few questions, I couldn't really find anything on with the documentation unless I'm missing something or don't understand imputation process/logic.

Basically the most important is that since sometimes the 'imputed' values are different, I'd like to take the average - if it is numeric - or mode if it is a categorical value.

All the examples that I see showing "complete(miced_model, 1)". If I'm running the mice model with 5 or 10 different iterations I don't see the point in just picking 1. I'd like the average of all of them.

Can anyone show me how to do this?

set.seed(2016)
library(mice)
nhanes # this is the dataset
nhanes[5,1]=NA  # setting up some categorical examples
nhanes[1,1]=NA
nhanes$age = as.factor(nhanes$age)
imputed_values = mice(nhanes, m = 5, method='rf',maxit = 3)
new_nhanes = complete(imputed_values, 'long') # or repeated? or what?

new_hanes_fixed =   # new data frame with averaged values imputed rather than just arbitrary '1st' iteration?

THANKS!!

You want the `mean` or `median` by `.id` (you should also calculate the `sd` or `IQR`). That's a standard aggregate by group question and has been answered *ad nauseam*. — Roland, Dec 19 '16 at 07:28
Not sure if this is helpful, but have you looked at the simputation package? — lawyeR, Dec 19 '16 at 12:49
This is not how you shoud do MI. Normally, you're supposed to analyse each data set separately, and then you pool the estimates of these analyses according to the rules proposed by Rubin (1987) or similar using the functions implemented in several R packages (`mice`, `mitools`, `mitml`). If you average imputations, you loose the benefits of conducting MI in the first place. In fact, averaging imputation tends to be worse than just using a single imputation! — SimonG, Jan 13 '17 at 17:28

score 0 · Answer 1 · answered Apr 18 '17 at 23:30

You should look at the comment of SimonG

You are completely on the wrong track. The whole point of multiple imputation is that you have different imputed datasets. (on which you would perform your analysis)

If you don't need multiple imputation you can directly use single imputation methods.( for example kNN or imri function from the VIM package)

score 0 · Answer 2 · answered Jul 14 '17 at 18:21

0

It sounds like you want to pool your results of your analysis, that way you run your analysis on every imputed data set. Read more here on Pooling Data: https://www.r-bloggers.com/imputing-missing-data-with-r-mice-package/

answered Jul 14 '17 at 18:21

wissem

58
8

averaging imputation of missing values

2 Answers2