I have the following framework for fitting a random forest to my data set
set.seed(123)
split <- initial_split(data_num, prop = 0.8, strata = positive)
train_data <- training(split)
test_data <- testing(split)
rf_rec <- recipe(positive ~., data = train_data) %>%
step_upsample(positive, over_ratio = 1)
rf_prep <- prep(rf_rec)
juiced <- juice(rf_prep)
juiced <- janitor::clean_names(juiced)
test_data <- janitor::clean_names(test_data)
X <- juiced[which(names(juiced) != "positive")]
predictor <- Predictor$new(model, data = X, y = juiced$positive)
I'm getting the following error when running SHAPLEY on my dataset.
shapley <- Shapley$new(predictor, x.interest = X[1, ])
Error in colMeans(self$predictor$predict(private$sampler$get.x())) : 'x' must be numeric
Does anyone know why I'm getting this error?
the dataset I use (data_num
) is just a binary matrix of 1s and 0s, where all feature values (f1,f2..) are type num and the target is a Factor w/ 2 levels "1","0".
f1 f2 f3 ... target
0 0 1 1
1 0 0 0
1 1 1 1