0

I have clustered data with 21% missing values in the cluster variable, which was derived from a 'date' variable with similar missing data. I'm trying to impute the missing data in the cluster variable without imputing anything else. No other variables have missing data (edit: about 20 variables total in the dataframe). All variables are numeric or logical or factor or date format. I'm using PMM for the imputation model because the variable is continuous but not normally distributed and I don't want to introduce any values outside the range of the current values.

When I run mice with maxit = 0, I don't see any logged errors. When I increase the maxit to anything other than 0, then there are no errors but when I view the imputed data sets, all the values in the variable with missing values (cluster) are set to NA, and all the values in all the non-imputed variables (regardless of their predictor value in the matrix) are set to NaN.

I've looked through these resources that have some tutorials but couldn't find any solutions

https://stefvanbuuren.name/fimd/ch-multilevel.html   
https://bookdown.org/mwheymans/bookmi/multiple-imputation-models-for-multilevel-data.html.  
https://www.gerkovink.com/miceVignettes/    
https://www.nerler.com/teaching/fgme2019/MICourse_Slides.pdf   
https://nerler.github.io/EP16_Multiple_Imputation/slide/

I've read that colinearity can lead to NA imputed values but there are no colinearity errors. I tried adding ridge=0.001 and/or threshold=1.1 to make the model more robust without success. I wondered if having date set as a cluster was a problem, so I tried setting date = 1 (using it as a predictor) and that did not give me any colinearity errors (or any other errors for that matter) but provided the same result of NA and NaN. I've also tried setting date = 0 in the predictor matrix. That does not cause any problems with the dry run, but in the model with 5 iterations, I get this error, so i don't think that is a solution.

    Error in .imputation.level2(y = y, ry = ry, x = x, type = type, wy = wy,  : 
    No class variable

I'm not sure what the class variable is. is it the cluster variable?

My code:

    md.pattern(data)
     id x y date_cr cluster
1154  1 1 1 1       1       0  
304   1 1 1 0       0       2  
      0 0 0 304     304     608            

#set predictor matrix

pm = make.predictorMatrix(data)
#date has colinearity with cluster - not sure if it matters if i code it as -2 or 0. 
pm[,c("date","cluster")] = -2
pm[, c("id")] = 0
#all variables set =1 have no missing values
pm[,c("x","y","z")] = 1

#set imputation method
impmethod = character(ncol(data))
names(impmethod) = colnames(data)
impmethod["cluster"] = "2lonly.pmm"

#Dry run gives no errors or logged events

> mi = mice(data, m=5, predictorMatrix = pm, method = impmethod,
            maxit=0, printFlag = TRUE, seed=1)

#Data is imputed correctly (range in data is 1-99)

> mi$imp$cluster
      1  2  3  4  5
782  79 38 34 63 41
783  45 58 20 85 22
784   8 54 51 12 61
785  67 97 66 43 41
786  32 84  8 14 31
> mi$chainMean
, , Chain 5                   
x                  
y

#i assume it's ok these values are blank because there are 0 iterations? or maybe this results suggests a problem?

#However, increasing the number of iterations to anything >0 causes failure to impute without #logging any problems:

> mi5 = mice(data, m=5, predictorMatrix = pm, method = impmethod,
            maxit=5, printFlag = FALSE, seed=1)

> mi5$imp$cluster
      1  2  3  4  5
782  NA NA NA NA NA
783  NA NA NA NA NA
784  NA NA NA NA NA
785  NA NA NA NA NA
> mi5$chainMean
, , Chain 5

  1   2   3   4   5
x NaN NaN NaN NaN NaN
y NaN NaN NaN NaN NaN

edit:

R version 4.2.2 (2022-10-31)

mice_3.16.0

august2020
  • 11
  • 2

0 Answers0