0

I have the following framework for fitting a random forest to my data set

set.seed(123)
split <- initial_split(data_num, prop = 0.8, strata = positive)
train_data <- training(split)
test_data <- testing(split)



rf_rec <- recipe(positive ~., data = train_data) %>%
  step_upsample(positive, over_ratio =  1)

rf_prep <- prep(rf_rec)
juiced <- juice(rf_prep)

juiced <- janitor::clean_names(juiced)
test_data <- janitor::clean_names(test_data)

X <- juiced[which(names(juiced) != "positive")]
predictor <- Predictor$new(model, data = X, y = juiced$positive)

I'm getting the following error when running SHAPLEY on my dataset.

shapley <- Shapley$new(predictor, x.interest = X[1, ])
Error in colMeans(self$predictor$predict(private$sampler$get.x())) : 'x' must be numeric

Does anyone know why I'm getting this error?

the dataset I use (data_num) is just a binary matrix of 1s and 0s, where all feature values (f1,f2..) are type num and the target is a Factor w/ 2 levels "1","0".

f1 f2  f3 ... target
0  0   1      1
1  0   0      0
1  1   1      1
Eisen
  • 1,697
  • 9
  • 27

1 Answers1

1

Try to start the rf model before. that means:

model <- randomForest(YourY ~ ., importance=T, data = YourData)
mod_rf <- Predictor$new(model, data = X)
shapley_rf <- Shapley$new(predictor = mod_rf, x.interest = X[1, ])