0

I’m trying to select only one data frame, after performing an imputation with aregImpute and impute.transcan. However, I cannot get back a variable that was kept out of the imputation model. Can somebody tell me how to do it?

If we illustrate this problem using the following reproducible example, how could I get the dataframe with all the variables that were not imputated, such as Species and id variables?

data("iris") 
library(missForest)
library(tidyverse)
library(Hmisc)
    
# example
iris.missing <- iris %>% 
group_by(Species) %>% 
prodNA(noNA = 0.1) %>% 
ungroup() %>% 
mutate(id = row_number())
    
imputation_model <- aregImpute(~ Sepal.Length + Sepal.Width + Petal.Width,
                               n.impute = 3, data = iris.missing,
                               pr = FALSE, type = 'pmm')
    
data_imp <- impute.transcan(imputation_model,
                            imputation = 1,
                            data = iris.missing,
                            list.out = TRUE,
                            pr = FALSE,
                            check = FALSE)

datos_imp <- bind_rows(data_imp)
slamballais
  • 3,161
  • 3
  • 18
  • 29

1 Answers1

0

There is no function that does it for you. You'd have to complete them by hand. Here is an example of how to do it for one dataset:

new_data <- iris.missing
new_data[, names(data_imp)] <- bind_rows(data_imp)
head(new_data)
# # A tibble: 6 x 6
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species    id
#          <dbl> <impute>           <dbl> <impute>    <fct>   <int>
# 1          5.1 3.5                  1.4 0.2         setosa      1
# 2          4.9 3.0                  1.4 0.2         NA          2
# 3          4.7 3.2                  1.3 0.2         setosa      3
# 4          4.6 3.2                  1.5 0.2         setosa      4
# 5          5   3.6                  1.4 0.2         setosa      5
# 6          5.4 4.4                  1.7 0.4         setosa      6

And here for all datasets, outputting a list of datasets:

imps <- lapply(seq_len(3), function(i) {
  data_imp <- impute.transcan(imputation_model,
                              imputation = i,
                              data = iris.missing,
                              list.out = TRUE,
                              pr = FALSE,
                              check = FALSE)
  iris.missing[, names(data_imp)] <- bind_rows(data_imp)
  iris.missing
})

More commonly, people just use the mice package for imputation. It is much easier to use with subsequent functions. For mice, we just use complete:

imp <- mice::mice(iris.missing)

# one dataset at a time
imp_1 <- complete(imp)
imp_2 <- complete(imp, 2)
imp_3 <- complete(imp, 3)

# all datasets at once
imps <- lapply(1:imp$m, complete)
slamballais
  • 3,161
  • 3
  • 18
  • 29