How to best avoid unique value error w/ R package 'iml'

Question

unfortunately I cannot share my data-set and my problem is data-set related, so a bad starting place I guess... but maybe someone's got a hint anyway?

I'm starting with a data-set called input.rf which I am splitting into train and test as follows:

train.index <- createDataPartition(input.rf[,dependent], p=p.train, list=FALSE)
input.train <- input.rf[train.index,]
input.test <- input.rf[-train.index,]

I've trained a random forest model rf.model using caret on the train data-set:

rf.model <- train(input.train[, names(input.rf) != dependent, drop=FALSE], input.train[, names(input.rf) == dependent], method="rf", ntree=ntree, metric=metric, trControl=cntrl, tuneGrid=tunegrid, nodesize=floor(nrow(input.train)*(nodesize/100)), importance=TRUE)

I then wanted to use iml as follows with my train and test data-sets excluding the dependent column from the model:

iml.model.train <- Predictor$new(rf.model, data=input.train[which(names(input.train) != dependent)])
iml.model.test<- Predictor$new(rf.model, data=input.test[which(names(input.test) != dependent)])

If I then try to look at the effects like so ...

iml.effects.train <- FeatureEffects$new(iml.model.train, method="ale")
iml.effects.test<- FeatureEffects$new(iml.model.test, method="ale")

... I get the following error: Error in initialize(...) : feature has only one unique value

This is because splitting into train and test can produce subsets w/ one unique value for some features. While I could probably check if this has happened and remove the affected features in my train and test data-sets, respectively, I'm wondering if there is a more elegant solution that delivers results in the same format for train and test. Any ideas?

Thanks, Mark

How to best avoid unique value error w/ R package 'iml'

0 Answers0