tidymodelRs out there!
The Goal: reuse a recipe object for multiple modeling types (logistic, RF, etc).
The Data: survey data that I have condensed to outcome_yes
(numeric count of when something happened), outcome_no
(numeric count of when that something didn't happen), total_tested
(numeric count of times we wanted to see what would happen -- sum of yes
and no
outcome variables), cat_pred
(yes/no categorical predictor), and num_pred
(count potential outcome barriers).
What works:
- Using
glm()
withcbind()
:
example_data |>
glm(
cbind(outcome_yes, outcome_no) ~ cat_pred + num_pred,
family = binomial(),
data = _
)
- Using events/trials syntax:
example_data |>
glm(
outcome_yes / total_tested ~ cat_pred + num_pred,
family = binomial(),
weights = total_tested,
data = _
)
The 2 above methods provide the same results and have also been replicated in SAS.
The ISSUE:
Using either of the methods expressed above within a recipe()
yields the error...
Error in `inline_check()`:
! No in-line functions should be used here; use steps to define baking actions.
Backtrace:
1. recipes::recipe(...)
2. recipes:::recipe.formula(...)
3. recipes:::form2args(formula, data, ...)
4. recipes:::inline_check(formula)
The next attempt involved using this tidymodels
multivariate analysis example where the dependent variable was changed to outcome_yes + outcome_no
. Successful until the fit()
step shown below:
the_recipe <-
recipe(
outcome_yes + outcome_no ~ cat_pred + num_pred,
family = binomial(),
data = example_data
) |>
step_relevel(all_factor_predictors(), ref_level = 'No') |>
step_dummy(all_factor_predictors())
the_model <-
logistic_reg() |>
set_engine('glm') |>
set_mode('classification')
the_workflow <-
workflow() |>
add_recipe(the_recipe) |>
add_model(the_model)
the_workflow |>
fit(example_data)
The fit()
also didn't like me:
Error in `check_outcome()`:
! For a classification model, the outcome should be a `factor`, not a `tbl_df`.
Backtrace:
1. generics::fit(the_workflow, example_data)
2. workflows:::fit.workflow(the_workflow, example_data)
3. workflows::.fit_model(workflow, control)
5. workflows:::fit.action_model(...)
6. workflows:::fit_from_xy(spec, mold, case_weights, control_parsnip)
8. parsnip::fit_xy.model_spec(...)
9. parsnip:::xy_form(...)
10. parsnip:::check_outcome(env$y, object)
Any help to get me past this would be a tremendous! Again, the goal is to create a recipe that can be used within the workflow and then incorporated with multiple modeling types. Thank you for reading this far and I appreciate your time.
KG