1

I have a dataset like this

structure(list(age = c(20, 21, 30, NA, NA, NA, 50, 61, 60, 63, 
NA, NA, NA), sex = c(NA, 0, NA, 1, NA, 1, 0, NA, NA, NA, NA, 
0, 1), diabetes = c(NA, NA, 1, 1, NA, 1, NA, 1, 1, 1, 0, 0, NA
), hypertension = c(1, NA, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1), 
    hypercholesterolemia = c(1, 1, NA, 1, 0, 0, NA, 1, NA, 1, 
    0, 0, 0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-13L))

Could you please tell me how I can perform MICE imputation? I want to imput all the missing values. I tried reading tutorials on the Internet but I get errors or I don't impute everything. Just the code with example is enough, I will adjust the settings later

jay.sf
  • 60,139
  • 8
  • 53
  • 110
user19745561
  • 145
  • 10
  • Hi! I'm not sure if I got it... You mean you want the user to input one value to each NA and then replace them - one at a time? – Lucas Feb 21 '23 at 23:42

2 Answers2

2

As a starting point, I brought here an example. The following default settings are used in the mice function to start imputation, so I just here brought important parameters which are 'm' i.e how many imputed dataset must be generated, 'maxit' or how many iterations should be usesd for each imputed dataset, and imputation method or 'method' argument which I used here predictive mean matching 'pmm'. But for complete explanation of these options within the mice function, see ?mice. Then you may decide how to adjust these options effectively. importing your data

df<- structure(list(age = c(20, 21, 30, NA, NA, NA, 50, 61, 60, 63, 
                       NA, NA, NA), sex = c(NA, 0, NA, 1, NA, 1, 0, NA, NA, NA, NA, 
                                            0, 1), diabetes = c(NA, NA, 1, 1, NA, 1, NA, 1, 1, 1, 0, 0, NA
                                            ), hypertension = c(1, NA, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1), 
               hypercholesterolemia = c(1, 1, NA, 1, 0, 0, NA, 1, NA, 1, 
                                        0, 0, 0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                                           -13L))

Start imputation using mice() function as example:

imp <- mice(df
             ,m = 10
             ,maxit = 10
             ,method = 'pmm'
             ,printFlag = FALSE # do not show imputation process
              ) 

#A summary of the imputation results can be obtained by calling the imp object.
imp

The imputed datasets can be extracted by using the complete function.

miceOutput <- complete(imp, action='long') # generate all completed data sets in long format

The imputed datasets can further be used in mice to conduct pooled analyses or to store them for next use. Hope it could helps

S-SHAAF
  • 1,863
  • 2
  • 5
  • 14
  • 1
    @Lucas, Yes m=10 means that you will have 10 completed datasets which are identified by '.imp' column. The action= option is an option to ask how many or from which imputed dataset should be extracted. If you change action =1, then only the first imputed dataset will be extracted. action=3, the third one and so on. – S-SHAAF Feb 22 '23 at 01:30
  • 1
    Great answer +1. It should be even more emphasized that using `action='long'` and pooling the analyses is the only correct way. Often using `m` >= 20 is recommended. Please read: https://stackoverflow.com/a/66059183/6574038 – jay.sf Feb 22 '23 at 04:54
0

Try this:

library(miceRanger)
mr <- miceRanger(dataframe,valueSelector = "meanMatch",verbose=FALSE, returnModels = T) # fits the models
df_final <- ampute(dataframe,mr) # ampute the data

You can use valueSelector = "value" too.

More details in: https://cran.r-project.org/web/packages/miceRanger/vignettes/miceAlgorithm.html and https://cran.r-project.org/web/packages/miceRanger/miceRanger.pdf

Lucas
  • 302
  • 8