Multiple Imputation within Python and Decisiontrees

Question

I was trying to do multiple imputation in python. My motivation is driven by the mice package in R, however, I am looking for something equivalent in python. I found the IterativeImputer of sklearn. Following documentation and some posts on SO I am able to produce multiple imputed sets. However, this the imputed values are drawn from a distribution by setting sample_posterior = True. But this is not what I am looking for. I would like to draw the values not from a distribution but to be a real sample. I.e. as in R, draw from those values that are in the same leaf in a decision tree. (see page 94 https://cran.r-project.org/web/packages/mice/mice.pdf). Is there a way to change the "prediction" of a decision tree within the IterativeImputer to drawing a random observation of the same leaf?

Documentation: https://scikit-learn.org/stable/modules/impute.html

Post on SO: IterativeImputer - sample_posterior and Imputing missing values using sklearn IterativeImputer class for MICE

score 0 · Answer 1 · answered Jul 13 '22 at 18:34

0

miceforest does what you are looking for. It implements mean matching by default, which will pull from real samples in the data.

However, miceforest uses lightgbm as a backend. This may or may not be what you want.

answered Jul 13 '22 at 18:34

Suspicious_Gardener

126
3

Multiple Imputation within Python and Decisiontrees

1 Answers1