1

I am creating a recipe so that I first create a calculated column called "response" as so:

rec <- recipe( ~., data = training) %>%
  step_mutate(response = as.integer(all(c('A', 'B') %in% Col4) & Col4 == 'A'))

I would like to now specify this new calculated column as the response variable in the recipe() function as shown below. I will be doing a series of operations on it such as this first one with step_naomit. How do I re-specify my response in recipe() to be the calculated column from my previous step (above) using recipes?

recipe <- recipe(response ~ ., data = training) %>%
          step_naomit(recipe, response)
piper180
  • 329
  • 2
  • 12

2 Answers2

3

You can set the role for new columns in the step_mutate() function by explictly setting the role= parmaeter.

rec <- recipe( ~., data = iris) %>%
  step_mutate(SepalSquared= Sepal.Length ^ 2, role="outcome")

Then check that it worked with summary(prep(rec))

  variable     type    role      source  
  <chr>        <chr>   <chr>     <chr>   
1 Sepal.Length numeric predictor original
2 Sepal.Width  numeric predictor original
3 Petal.Length numeric predictor original
4 Petal.Width  numeric predictor original
5 Species      nominal predictor original
6 SepalSquared numeric outcome   derived 
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thank you! This seems to be perfect for what I'm trying to do, however, I'm running into an error that says ```Error: Cant subset columns that don't exist.``` Any idea on how to make my mutated column recognized in this case? – piper180 Oct 25 '21 at 22:39
  • 1
    @ava It's tough to verify things work when you don't provide a reproducible example. I tried a different method and switched to the built in `iris` dataset to check that it worked. – MrFlick Oct 25 '21 at 22:51
  • Assigning the role within step_mutate instead seemed to do the trick! I appreciate the help and the sample with the Iris dataset. Thank you! – piper180 Oct 25 '21 at 22:57
3

This is related to tidymodel error, when calling predict function is asking for target variable

It is generally not advisable to modify the response inside your recipe. This is because the response variable won't be available to the recipe in certain cases, such as when using {tune}. I would recommend that you perform this transformation before you pass the data to the recipe. Even better if you do it before the validation split.

set.seed(1234)
data_split <-  my_data %>%
  step_mutate(response = as.integer(all(c('A', 'B') %in% Col4) & Col4 == 'A')) %>%
  initial_split()

training <- training(data_split)
testing <- testing(data_split)

rec <- recipe(response ~., data = training)
EmilHvitfeldt
  • 2,555
  • 1
  • 9
  • 12
  • Thank you, this is good to know. If I did this outside of the recipe though, would I be able to save my final model with the following? ```final_model <- fit(workflow_object, dataset_total) saveRDS(final_model, "model.rds")``` I'd like to create this new field on newly introduced data as well. – piper180 Oct 26 '21 at 13:38
  • 1
    The response will not be needed when you apply your fitted model to new data. Hence it doesn't matter if it is missing from the data you apply it to – EmilHvitfeldt Oct 26 '21 at 16:23