Currently using the tidymodels
framework and struggling to understand some differences in model predictions and performance results I get, specifically when I use both fit
and predict
on the exact same dataset (i.e. the dataset the model was trained on).
Below's a reproducible example - I'm using the cells dataset and training a random-forest on the data (rf_fit
). The object rf_fit$fit$predictions
is one of the sets of predictions I assess the accuracy of. I then use rf_fit
to make predictions on the same data via the predict
function (yielding rf_training_pred
, the other set of predictions I assess the accuracy of).
My question is - why are these sets of predictions different from each other? And why are they so different?
I presume something must be going on under the hood I'm not aware off, but I'd expected these to be identical, as I'd assumed that fit()
trained a model (and has some predictions associated with this trained model) and then predict()
takes that exact model and just re-applies it to (in this case) the same data - hence the predictions of both should be identical.
What am I missing? Any suggestions or help in understanding would be hugely appreciated - thanks in advance!
# Load required libraries
library(tidymodels); library(modeldata)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
# Set seed
set.seed(123)
# Split up data into training and test
data(cells, package = "modeldata")
# Define Model
rf_mod <- rand_forest(trees = 1000) %>%
set_engine("ranger") %>%
set_mode("classification")
# Fit the model to training data and then predict on same training data
rf_fit <- rf_mod %>%
fit(class ~ ., data = cells)
rf_training_pred <- rf_fit %>%
predict(cells, type = "prob")
# Evaluate accuracy
data.frame(rf_fit$fit$predictions) %>%
bind_cols(cells %>% select(class)) %>%
roc_auc(truth = class, PS)
#> # A tibble: 1 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 roc_auc binary 0.903
rf_training_pred %>%
bind_cols(cells %>% select(class)) %>%
roc_auc(truth = class, .pred_PS)
#> # A tibble: 1 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 roc_auc binary 1.00
Created on 2021-09-25 by the reprex package (v2.0.1)