Im struggling with an imputation using mice. The main objective is to impute NAs (if possible by group). As the sample is a bit large to simple post here it is downloadable: https://drive.google.com/open?id=1InGJ_M7r5jwQZZRdXBO1MEbKB48gafbP
My questions are:
How big of an issue is correlated data in general? What can I do to still impute the data? The data is part of an empirical research question and I don't yet know which variables to include, thus it'd be best to keep as many as possible for the time being.
What methods would be more suitable than "cart" & "pmm" ? I'd like not to simply impute the mean/median....
Can I somehow impute the data by "ID"
Tips for debugging?
Here my code
#Start
require(mice)
require(Hmisc)
'setwd(...)
'test.df<-read.csv(...)
str(test.df)
Check for correlation: The first 2 columns contain identifiers and Year thus no need to look into.
test.df.rcorr<-rcorr(as.matrix(test.df[,-c(1:2)]))
test.df.coeff<-test.df.rcorr$r
test.df.coeff<-corrplot(test.df.coeff)
As can be seen there is some strong correlation in the data. For a simple task omit all columns with strong correlation.
#Simple example
test.df2<-test.df[,-c(4,7,10,11)]
test.df2
sum(is.na(test.df2))
Now, lets impute the test.df2 without specifying the method:
imputation.df2<-mice(test.df2, m=1, seed=123456)
imputation.df2$method
test.df2.imp<-mice::complete(imputation.df2)
Warning message:
Number of logged events: 1
sum(is.na(test.df2.imp))
As can be seen, all the NAs are imputed. And the method used is "pmm" only.
Using the full data set, I get the following error message almost immediately:
imputation.df<-mice(test.df,m=1,seed = 66666)
iter imp variable
1 1 x1Error in solve.default(xtx + diag(pen)) :
system is computationally singular: reciprocal condition number = 1.49712e-16
Is this merely due to the correlation in the data?
Finally, my code for imputation by ID, which runs a little longer before showing this error:
test123<- lapply(split(test.df, test.df$ID), function(x) mice::complete(mice(x, m = 1 ,seed = 987654)))
Error in edit.setup(data, setup, ...) : nothing left to impute
In addition: There were 19 warnings (use warnings() to see them)
Called from: edit.setup(data, setup, ...)
I know this is a long question, and I m grateful for every little tip or hint!
Thanks a bunch!