I’m trying to make the jump from Scikit-Learn to Tidymodels and most of the time it has been relatively painless thanks to the tutorials from Julia Silge and Andrew Couch. However, now I’m stuck. Normally I would use initial_split(df, strata = x) to get a split object to work with. But this time I’ve been provided with the test and train sets from a different department and I’m afraid this might become the norm. Without a split object functions like last_fit() and collect_predictions() don’t work.
How can I reverse engineer the provided datasets so that they become rsplit objects? Or alternatively, is it possible to bind the datasets together first and then tell initial_split() exactly what rows should go to train and test?
I see that someone asked the same question at https://community.rstudio.com/t/tidymodels-creating-a-split-object-from-testing-and-training-data-perform-last-fit/69885. Max Kuhn said you could reverse engineer an rsplit object but I didn’t understand how. Thanks!
# Example data
train <- tibble(predictor = c(0, 1, 1, 1, 0, 1, 0, 0),
feature_1 = c(12, 18, 15, 5, 20, 2, 6, 10),
feature_2 = c(120, 98, 111, 67, 335, 123, 22, 69))
test <- tibble(predictor = c(0, 1, 0, 1),
feature_1 = c(5, 13, 8, 9),
feature_2 = c(132, 105, 99, 112))