I'm experimenting with the parsnip package using the Titanic dataset.
library(titanic)
library(dplyr)
library(tidymodels)
library(rattle)
library(rpart.plot)
library(RColorBrewer)
train <- titanic_train %>%
mutate(Survived = factor(Survived),
Sex = factor(Sex),
Embarked = factor(Embarked))
test <- titanic_test %>%
mutate(Sex = factor(Sex),
Embarked = factor(Embarked))
spec_obj <-
decision_tree(mode = "classification") %>%
set_engine("rpart")
spec_obj
fit_obj <-
spec_obj %>%
fit(Survived ~ Pclass + Sex + Age + SibSp + Parch + Fare + Embarked, data = train)
fit_obj
fancyRpartPlot(fit_obj$fit)
pred <-
fit_obj %>%
predict(new_data = test)
pred
Let's say I would like to add some parameters to my model function.
spec_obj <- update(spec_obj, min_n = 50, cost_complexity = 0)
fit_obj <-
spec_obj %>%
fit(Survived ~ Pclass + Sex + Age + SibSp + Parch + Fare + Embarked, data = train)
fit_obj
fancyRpartPlot(fit_obj$fit)
Is there any way to circumvent specifying the model and dataset a second time in the fit()
function?
============== edit ================
I discovered you can save the formula in a variable:
f <- as.formula("Survived ~ Pclass + Sex + Age + SibSp + Parch + Fare + Embarked")
fit_obj <-
spec_obj %>%
fit(f, data = train)
fit_obj
Still, there may be a better way?