Your "categorical" variable appears to be in character format. You may want to coerce them into factors before imputing. Otherwise mice()
will ignore the variable. Do:
DATA[sapply(DATA, is.character)] <- lapply(DATA[sapply(DATA, is.character)], as.factor)
str(DATA)
# 'data.frame': 1000 obs. of 4 variables:
# $ x1: Factor w/ 5 levels "a","b","c","d",..: 2 2 NA NA 3 3 4 NA NA 4 ...
# $ x2: num 0.932 0.87 0.886 0.925 0.984 ...
# $ x3: num 0.292 0.734 0.764 0.943 0.806 ...
# $ x4: Factor w/ 4 levels "t","u","v","w": 1 3 1 3 4 3 1 4 3 2 ...
head(DATA)
# x1 x2 x3 x4
# 1 b 0.9315629 0.2916144 t
# 2 b 0.8695138 0.7338165 v
# 3 <NA> 0.8863894 0.7642693 t
# 4 <NA> 0.9248280 0.9427943 v
# 5 c 0.9844646 0.8062173 w
# 6 c 0.6200558 0.7354498 v
Also, it might be a better idea to use a proportional odds model ("polr"
) for ordered categorical data instead of partial mean matching ("pmm"
).
library(mice)
IMP <- mice(DATA, m=5, maxit=50, meth=c("polr", "", "", ""), seed=500)
DATAIMPUTE <- complete(IMP)
head(DATAIMPUTE)
# x1 x2 x3 x4
# 1 b 0.9315629 0.2916144 t
# 2 b 0.8695138 0.7338165 v
# 3 a 0.8863894 0.7642693 t
# 4 a 0.9248280 0.9427943 v
# 5 c 0.9844646 0.8062173 w
# 6 c 0.6200558 0.7354498 v
Important note: You seem to missunderstand the method if you think the complete()
function gives you a valid imputed dataset (it just action=1
as default and returns just the first completed data set—no multiple imputation at all!). You probably should consult a statistician and read the documentation more thoroughly. There's a nice answer around, that briefly summarizes the most important point.
Data:
set.seed(74)
DATA=data.frame(x1=c(sample(c(letters[1:5], NA), 1000, r=T)),
x2=runif(1000),
x3=runif(1000),
x4=sample(letters[20:23], 1000, r=T))