I am trying to use the iml package to build the Shapely plots. As per documentation given in the package and this post, if model being used is not from the supported packages like caret, a custom prediction function has to be defined. When I am trying to attempt Shapely plot with a ranger model used for regression, I am getting an error.
Code that I am executing:
featuresData <- as.data.frame(AAads[, c(numVars,factVars, "noiseInformation")])
responseData <- as.numeric(as.vector(AAads[,depVar]))
predFunction <- function(model, newData) {
results <- predict(object = model, data = newData)
results <- as.numeric(as.vector(results$predictions))
return(results)
}
predictorRf <- Predictor$new(model = rfModel, data = featuresData, y = responseData, predict.fun = predFunction)
No error up till this point, when I execute following code for Shapely plot for an instance of data, I get an error:
shapley = Shapley$new(predictor = predictorRf, x.interest = trainingData[1,])
Error in (function (model, newData) : unused argument (newdata = list(30, 6063047, 523433, 51, 36, 8, 6, 5, 3, 1, 2, 4, 3, 2, 42, 0.226619379129261))
The list of values displayed in error are the first row values trainingData[1,]
Here, "AAads" is a data frame with all the data used in training and testing the model and "trainingData" is a subset of it. "rfModel" is a ranger regression model. "numVars" and "factVars" are the lists of numeric and factor independent variables, "noiseInformation" is random data used as independent variable and introduced for checking the sanity of variable importance, PDP and Shapely plots.
Note: I can use alternate solutions like building ranger model through train function of caret or use randomForest which is supported but I am interested in understanding what am I missing with my current approach. One guess is that I have categorical data in independent variables and probably Shapely only takes numerical data even when categorical data is present. I think so because when I look at the error, I see only integers where categorical data should also be there.