6

I have a question regarding the aggregation of imputed data as created by the R-package 'mice'.

As far as I understand it, the 'complete'-command of 'mice' is applied to extract the imputed values of, e.g., the first imputation. However, when running a total of ten imputations, I am not sure, which imputed values to extract. Does anyone know how to extract the (aggregate) imputed data across all imputations?

Since I would like to enter the data into MS Excel and perform further calculations in another software tool, such a command would be very helpful.

Thank you for your comments. A simple example (from 'mice' itself) can be found below:

R> library("mice")
R> nhanes
R> imp <- mice(nhanes, seed = 23109) #create imputation
R> complete(imp) #extraction of the five imputed datasets (row-stacked matrix)

How can I aggregate the five imputed data sets and extract the imputed values to Excel?

Metrics
  • 15,172
  • 7
  • 54
  • 83
user4624133
  • 69
  • 1
  • 2

3 Answers3

5

I had similar issue. I used the code below which is good enough to numeric vars. For others I thought about randomly choose one of the imputed results (because averaging can disrupt it).

My offered code is (for numeric):

tempData <- mice(data,m=5,maxit=50,meth='pmm',seed=500)
completedData <- complete(tempData, 'long')
a<-aggregate(completedData[,3:6] , by = list(completedData$.id),FUN= mean)
  1. you should join the results back.
  2. I think the 'Hmisc' is a better package.
  3. if you already found nicer/ more elegant/ built in solution - please share with us.
Yaron
  • 1,726
  • 14
  • 18
1

You should use complete(imp,action="long") to get values for each imputation. If you see ?complete, you will find

complete(x, action = 1, include = FALSE)

Arguments

x   
An object of class mids as created by the function mice().

action  
If action is a scalar between 1 and x$m, the function returns the data with imputation number action filled in. Thus, action=1 returns the first completed data set, action=2 returns the second completed data set, and so on. The value of action can also be one of the following strings: 'long', 'broad', 'repeated'. See 'Details' for the interpretation.

include 
Flag to indicate whether the orginal data with the missing values should be included. This requires that action is specified as 'long', 'broad' or 'repeated'.

So, the default is to return the first imputed values. In addition, the argument action can also be a string: long, broad, and repeated. If you enter long, it will give you the data in long format. You can also set include = TRUE if you want the original missing data.

Metrics
  • 15,172
  • 7
  • 54
  • 83
  • Thank you for your response. Sorry for being unclear, but I am looking for imputed values across all m imputations since I don't know, which of the m imputations is best. Is there a way how to extract, e.g., an average or so? – user4624133 Mar 03 '15 at 09:08
  • 1
    As I mentioned in the answer if you type `complete(imp,action="long")` it will give the imputed values for all imputations. – Metrics Mar 03 '15 at 12:41
  • @user4624133 The imputed values are stored in the `imp$imp` object. You could average them, but that would destroy the whole idea of multiple imputation. If you want a single imputed dataset, then set `m = 1` (not recommended), but better than trying to chase the "best value", which does not exist. – Stef van Buuren Jul 19 '23 at 20:53
0

ok, but still you have to choose one imputed dataset for further analyses... I think the best option is to analyze using your complete(imp,action="long") and pool the results afterwards.fit <- with(data=imp,exp=lm(bmi~hyp+chl)) pool(fit)

but I also assume its not forbidden to use just one of the imputed datasets ;)