3

rpart has the ability to deal with na values by imputing them from surrogate splits. By setting usesurrogate = 2 in rpart.control, na values are dealt with. Is there a way to get the imputed version of the dataset from the rpart object?

num <- c(5, 8, 10, 12, NA)
cat1 <- factor(c("aa", "bb", NA, "cc", "cc"))
cat2 <- c("banana", "apple", "pear", "grape", NA)
some_dat <- data.frame(num = num, cat1 = cat1, cat2 = cat2)


tree_fit = rpart(num~., some_dat, method = 'anova', control = rpart.control(cp=0,maxdepth=5, usesurrogate = 2))

Mine
  • 831
  • 1
  • 8
  • 27
  • can you clarify what the desired output would look like? –  May 27 '21 at 23:34
  • @Baroque an imputed dataframe from rpart object. But my concern is that, what rpart does not really fit the definition of imputing. It rather follows the splits of surrogates corresponding to the best split variable in case na exists. So, my question might not make sense in that case. Still I was wondering if an imputed version of the dataset could be obtained from rpart object. By imputed I mean, a dataset with the null values filled with some sort of process like surrogate encoding. – Mine May 28 '21 at 10:56

0 Answers0