Questions tagged [r-recipes]

recipes is an R package by Max Kuhn and Hadley Wickham for creating and preprocessing design matrices.

recipes is an R package by Max Kuhn and Hadley Wickham for creating and preprocessing design matrices.

131 questions
1
vote
1 answer

rlang Expressions in List as Args to Function

I'm trying to parse user input as arguments to a function call (within an expression). It seems like I'm close but !!! is wrapping my arguments in parenthesis which is not working. I'm trying to recreate the following with user inputs: recipe(mpg ~…
Alex Gray
  • 45
  • 1
  • 7
1
vote
0 answers

Is It necessary to normalize Data to generate Natural Splines in Recipes

I fitted a model using natural splines and I am not sure if there are any advantages of using BoxCox and center and scale on the predictors. Does the step natural spline perform the transformations? Are there advantages in normalizing the predictors…
1
vote
0 answers

Using Caret & recipes to train a model: ERROR Not all variables in the recipe are present

I've been trying to train a caret glmnet model for the past few hours but it keeps throwing me errors, my dataset has 15 observations, 3 are factors variables, 11 are numeric and 1 is an integer. I split the dataset into 70/30 train test split. The…
1
vote
1 answer

How can I discretize the numeric variables without losing the original ones?

Here is my toy data with code. How can I discretize the numeric variables without losing the original ones? library(gapminder); library(tidyverse); library(tidymodels) gapminder %>% recipe(lifeExp ~ .) %>% step_discretize(all_numeric(),…
Geet
  • 2,515
  • 2
  • 19
  • 42
1
vote
1 answer

about recipes package in R

Hi I am using recipes for feature engineering in machine learning models. However, when I used step_dummy, dummy variables are regarded as numeric variables, not factor. I think this might be problematic when we use random forest or other tree…
user224050
  • 317
  • 3
  • 10
1
vote
2 answers

How to handle NAs due to novel factor levels using R recipes?

I preprocessed a training data set (A) und now want to reproduce these steps for a test set (B) using R recipes. The problem is, that there are new factor levels in the test set, that I want to ignore: library(recipes) (A <- data.frame(a = c(1:19,…
ghlavin
  • 163
  • 7
1
vote
0 answers

Recipe fails with caret::train

When using caret with recipes i get an error stating: Error in { : task 1 failed - "$ operator is invalid for atomic vectors" I managed to narrow it down to a problem with the recipe. But i am not sure what i'm doing wrong. Anyone has seen this…
Disou
  • 23
  • 5
1
vote
2 answers

recipes::step_dummy + caret::train -> Error:Not all variables in the recipe are present

I am getting the following error when using recipes::step_dummy with caret::train (first attempt at combining the two packages): Error: Not all variables in the recipe are present in the supplied training set Not sure what is causing the error…
user1420372
  • 2,077
  • 3
  • 25
  • 42
1
vote
1 answer

R: frequency encoding for categorical variables via recipies package

I am looking for functionality similar to https://rdrr.io/github/bfgray3/cattonum/man/catto_freq.html but implemented as a recipes::step_-function (https://tidymodels.github.io/recipes/reference/index.html) Is anyone aware of an implementation for…
stats-hb
  • 958
  • 13
  • 31
1
vote
2 answers

How to exclude certain variables from recipe?

When using the step_regex function to build a recipe for a model, it creates additional columns for certain patterns in the original column. Is there way to exclude the original column from the recipe once I'm done with it? For example in the…
moho wu
  • 471
  • 4
  • 13
0
votes
0 answers

Is there a step to use relative frequency instead of step_tokenfilter() in recipes

I'm building a regression model using this great approach by Emil Hvitfeldt and Julia Silge in R (https://smltar.com/mlregression#fnref7) and I was wondering if it could be possible to use relative frequency instead of absolute in the preprocessing…
Malichot
  • 43
  • 4
0
votes
0 answers

How to deal with equal times in multivariable time series

I'm trying to do predictive models for time series but I'm facing issues when we have the same year appearing twice or multiple times in a dataset. To give some context, I'm using this kaggle database that shows the life expectancy for people in…
0
votes
1 answer

Understanding why tune::last_fit metrics are different from summary()

Context: I try to evaluate a model, made using tune::last_fit() with an independent dataset. Problem: it seems that the metrics obtained with tune::collect_metrics() are different from the ones obtained using summary(). Question: what is the…
Paul
  • 2,850
  • 1
  • 12
  • 37
0
votes
1 answer

Error when using step_dummy() to handle categorical variables with tidymodels, recipe, bayesian

I am trying to use the bayesian package to build a tidy bayesian model. I am mostly following the bayesian's get started vignette, although my specific use-case differs slightly in that I am trying to model a numerical variable as a function of a…
leorar
  • 43
  • 3
0
votes
1 answer

How can I apply preprocessing in each cross-validation fold trained on each train part of the fold using tidymodels?

I try to use the tidymodels R-package for an ml pipeline. I can define a preprocessing pipeline (a recipe) on the training data and apply it to each re-sample of my cross-validation. But this uses the (global) training data to preprocess the folds.…
Richi W
  • 3,534
  • 4
  • 20
  • 39
1 2 3
8 9