0

I am trying to make sure that all my features of type factors are represented fully (in terms of all possible factor levels) both in my tree object and in my test set for prediction.

for (j in 1:length(predictors)){
    if (is.factor(Test[,j])){
      ct [[names(predictors)[j]]] <- union(ct$xlevels[[names(predictors)[j]]], levels(Test[,c(names(predictors)[j])]))

    }
}

however, for object ct (ctree from package party) I can't seem to understand how to access the features' factor levels, as I am getting an error

Error in ct$xlevels : $ operator not defined for this S4 class
user3424107
  • 117
  • 1
  • 5
  • party uses s4 methods which you do not index with $, you should read `?'BinaryTree-class'` – rawr Nov 07 '15 at 14:03
  • Possibly using the new S3 implementation of `ctree` in `partykit` is easier to use for your purpose...it also comes with more documentation. – Achim Zeileis Nov 08 '15 at 23:36

1 Answers1

0

I had this problem countless times and today I come up with a little hack that should make not needed to fix levels' discrepancy in factors.

Just make the model on the whole dataset (train + test) giving zero weight to test observations. This way the ctree model will not drop factor levels.

a <- ctree(Y ~ ., DF[train.IDs,]) %>% predict(newdata = DF) # Would trigger error if the data passed to predict would not match the train data levels
b <- ctree(Y ~ ., weights = as.numeric((1:nrow(DF) %in% train.IDs)), data = DF) %>% predict(newdata = DF) # passing the IDs as 0-1 in the weights instead of subsetting the data solves it
mean(a == b) # test that predictions are equals, should be 1

Tell me if it works as expected!

Bakaburg
  • 3,165
  • 4
  • 32
  • 64