0

Background: I'm creating a recipe to clean and transform time-series data that will be used by multiple models. One of the steps in the recipe is to remove correlated predictors using the step_corr() function.

However, due to the nature of the data set, some of the variables can have a constant value for the entire set of training data when doing cross-validation using a rolling window and thus cause the step_corr() function to throw a warning.

Problem Statement: In such cases, is it possible to exclude such variables from the correlation step? Or perhaps remove the variable entirely?

P.S. I know I can easily ignore the warning and proceed. But I'm looking for a cleaner approach / best practice advice.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
fahmy
  • 3,543
  • 31
  • 47

1 Answers1

2

There are two steps for you to consider:

  • step_zv() will remove variables that all have the same value (zero variance)
  • step_nzv() will remove variables that almost all have the same value (highly sparse and unbalanced)
Julia Silge
  • 10,848
  • 2
  • 40
  • 48