I have clustered data with 21% missing values in the cluster variable, which was derived from a 'date' variable with similar missing data. I'm trying to impute the missing data in the cluster variable without imputing anything else. No other variables have missing data (edit: about 20 variables total in the dataframe). All variables are numeric or logical or factor or date format. I'm using PMM for the imputation model because the variable is continuous but not normally distributed and I don't want to introduce any values outside the range of the current values.
When I run mice with maxit = 0
, I don't see any logged errors. When I increase the maxit
to anything other than 0, then there are no errors but when I view the imputed data sets, all the values in the variable with missing values (cluster
) are set to NA
, and all the values in all the non-imputed variables (regardless of their predictor value in the matrix) are set to NaN
.
I've looked through these resources that have some tutorials but couldn't find any solutions
https://stefvanbuuren.name/fimd/ch-multilevel.html
https://bookdown.org/mwheymans/bookmi/multiple-imputation-models-for-multilevel-data.html.
https://www.gerkovink.com/miceVignettes/
https://www.nerler.com/teaching/fgme2019/MICourse_Slides.pdf
https://nerler.github.io/EP16_Multiple_Imputation/slide/
I've read that colinearity can lead to NA imputed values but there are no colinearity errors.
I tried adding ridge=0.001
and/or threshold=1.1
to make the model more robust without success. I wondered if having date
set as a cluster was a problem, so I tried setting date = 1
(using it as a predictor) and that did not give me any colinearity errors (or any other errors for that matter) but provided the same result of NA
and NaN
. I've also tried setting date = 0
in the predictor matrix. That does not cause any problems with the dry run, but in the model with 5 iterations, I get this error, so i don't think that is a solution.
Error in .imputation.level2(y = y, ry = ry, x = x, type = type, wy = wy, :
No class variable
I'm not sure what the class variable is. is it the cluster variable?
My code:
md.pattern(data)
id x y date_cr cluster
1154 1 1 1 1 1 0
304 1 1 1 0 0 2
0 0 0 304 304 608
#set predictor matrix
pm = make.predictorMatrix(data)
#date has colinearity with cluster - not sure if it matters if i code it as -2 or 0.
pm[,c("date","cluster")] = -2
pm[, c("id")] = 0
#all variables set =1 have no missing values
pm[,c("x","y","z")] = 1
#set imputation method
impmethod = character(ncol(data))
names(impmethod) = colnames(data)
impmethod["cluster"] = "2lonly.pmm"
#Dry run gives no errors or logged events
> mi = mice(data, m=5, predictorMatrix = pm, method = impmethod,
maxit=0, printFlag = TRUE, seed=1)
#Data is imputed correctly (range in data is 1-99)
> mi$imp$cluster
1 2 3 4 5
782 79 38 34 63 41
783 45 58 20 85 22
784 8 54 51 12 61
785 67 97 66 43 41
786 32 84 8 14 31
> mi$chainMean
, , Chain 5
x
y
#i assume it's ok these values are blank because there are 0 iterations? or maybe this results suggests a problem?
#However, increasing the number of iterations to anything >0 causes failure to impute without #logging any problems:
> mi5 = mice(data, m=5, predictorMatrix = pm, method = impmethod,
maxit=5, printFlag = FALSE, seed=1)
> mi5$imp$cluster
1 2 3 4 5
782 NA NA NA NA NA
783 NA NA NA NA NA
784 NA NA NA NA NA
785 NA NA NA NA NA
> mi5$chainMean
, , Chain 5
1 2 3 4 5
x NaN NaN NaN NaN NaN
y NaN NaN NaN NaN NaN
edit:
R version 4.2.2 (2022-10-31)
mice_3.16.0