unfortunately I cannot share my data-set and my problem is data-set related, so a bad starting place I guess... but maybe someone's got a hint anyway?
I'm starting with a data-set called input.rf which I am splitting into train and test as follows:
train.index <- createDataPartition(input.rf[,dependent], p=p.train, list=FALSE)
input.train <- input.rf[train.index,]
input.test <- input.rf[-train.index,]
I've trained a random forest model rf.model
using caret
on the train data-set:
rf.model <- train(input.train[, names(input.rf) != dependent, drop=FALSE], input.train[, names(input.rf) == dependent], method="rf", ntree=ntree, metric=metric, trControl=cntrl, tuneGrid=tunegrid, nodesize=floor(nrow(input.train)*(nodesize/100)), importance=TRUE)
I then wanted to use iml
as follows with my train and test data-sets excluding the dependent column from the model:
iml.model.train <- Predictor$new(rf.model, data=input.train[which(names(input.train) != dependent)])
iml.model.test<- Predictor$new(rf.model, data=input.test[which(names(input.test) != dependent)])
If I then try to look at the effects like so ...
iml.effects.train <- FeatureEffects$new(iml.model.train, method="ale")
iml.effects.test<- FeatureEffects$new(iml.model.test, method="ale")
... I get the following error: Error in initialize(...) : feature has only one unique value
This is because splitting into train and test can produce subsets w/ one unique value for some features. While I could probably check if this has happened and remove the affected features in my train and test data-sets, respectively, I'm wondering if there is a more elegant solution that delivers results in the same format for train and test. Any ideas?
Thanks, Mark