0

I am currently performing regression modeling, with a dataset that has number of features (p) higher than observations (n). Typically p = 10000 and n = 30. Furthermore, I'd like to test many models and find the best one.

What I'm doing now is first to eliminate those features. Reducing it from 10K to 20-30, using step_select_mrmr() or step_select_vip(). I achieved that by placing it at the top of my pipeline. Then I would proceed with testing many models.

Is this approach reasonable?

littleworth
  • 4,781
  • 6
  • 42
  • 76

1 Answers1

1

It is reasonable as long as you are using resampling or a validation set to make sure that there is no information leakage.

We hope to have more recipe functions for supervised filters later this year but Steven's are great.

topepo
  • 13,534
  • 3
  • 39
  • 52