I was trying to do multiple imputation in python.
My motivation is driven by the mice package in R, however, I am looking for something equivalent in python. I found the IterativeImputer
of sklearn
.
Following documentation and some posts on SO I am able to produce multiple imputed sets. However, this the imputed values are drawn from a distribution by setting sample_posterior = True
. But this is not what I am looking for. I would like to draw the values not from a distribution but to be a real sample. I.e. as in R, draw from those values that are in the same leaf in a decision tree. (see page 94 https://cran.r-project.org/web/packages/mice/mice.pdf). Is there a way to change the "prediction" of a decision tree within the IterativeImputer
to drawing a random observation of the same leaf?
Documentation: https://scikit-learn.org/stable/modules/impute.html
Post on SO: IterativeImputer - sample_posterior and Imputing missing values using sklearn IterativeImputer class for MICE