2

I would like to use data imputation by using the mice package. My dataset contains the columns "A" to "G", but I only want to impute the values of column C and D.

In this article (https://www.r-bloggers.com/2016/06/handling-missing-data-with-mice-package-a-simple-approach/) it is explained how to exclude variables from being a predictor or being imputed - but I would like to use mice the other way round: I want to specify which variables ARE imputed - so only C and D should be imputed.

Is this possible?

Thank you!

zx8754
  • 52,746
  • 12
  • 114
  • 209
MDStat
  • 355
  • 2
  • 17
  • Maybe keep only columns that needs imputing? – zx8754 Sep 08 '21 at 14:03
  • But I want to use all information of column A to G also after data imputation... Or is this the wrong way of using imputation. As far as I understand I start with my data with column A to G, then use mice and in the end use the "complete" function to append the imputed data. Or does is work to create a temporary copy of my data, drop all columns which I do not want to impute - so C and D remain, and then use the complete function like data <- complete(tempdata)? – MDStat Sep 09 '21 at 04:09
  • After imputation you still have you original A & G columns, just replace after imputation? – zx8754 Sep 09 '21 at 06:40
  • Thank you for your answer: I want to use Columns A, B, C, D, E, F, G for all my subsequent analyses, but only apply the imputation on columns C + D. In the end I want to keep original data of A, B, E, F, G combined with the inputed values in colums C + D. I asked this question because I think it is easier to define, which columns I want to impute rather than which columns I DO NOT want to impute. Of couse I could exclude columns A, B, E, F, G - but I thought this could be done easier... I hope, I could explain my problem... – MDStat Sep 09 '21 at 07:12
  • Related post? https://stackoverflow.com/q/42161230/680068 – zx8754 Sep 09 '21 at 07:41
  • I think you are going against the idea of multiple imputation, by design you want to have multiple datasets with no missing values. – zx8754 Sep 09 '21 at 07:42

1 Answers1

3

Answer

Just invert the logic: In the methods vector, set every variable that is not one of your variables of interest to "":

meth[!names(meth) %in% c("C", "D")] <- ""

Example: Only impute Petal.Length and Petal.Width

data <- mice::ampute(iris, prop = 0.1)$amp
init <- mice(data, maxit = 0)
meth <- init$meth
meth[!names(meth) %in% c("Petal.Length", "Petal.Width")] <- ""
mice(data, meth = meth)

Rationale

You can supply a vector to the method argument of mice::mice. This vector should contain the methods that you want to use to impute the variables you want to impute. In the example they first do a dry-run (init <- mice(data, maxit = 0)), where the output contains a preset vector for you (init$method). For my example, it looks like this:

Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
       "pmm"        "pmm"        "pmm"        "pmm"        "pmm"

You can avoid variables being imputed by setting the method to "". This is one way to exclude variables. As I show with my example, you can invert that logic, thus ending up with only the variables you want to include.

slamballais
  • 3,161
  • 3
  • 18
  • 29