7

I have a data frame with some dummy variables that I want to use as training set for glmnet.

Since I'm using glmnet I want to center and scale the features using the preProcess option in the caret train function. I don't want that this transformation is applied also to the dummy variables.

Is there a way to prevent the transformation of these variables?

milos.ai
  • 3,882
  • 7
  • 31
  • 33
amarchin
  • 2,044
  • 1
  • 16
  • 32
  • 1
    Good question. We are having same issue in my group and trying to avoid hacky solutions. I will keep you updated in case something comes out. – Gianmario Spacagna May 17 '16 at 10:28
  • 1
    AFAIK this is not addressed in `caret::train` and `caret::trainControl` yet, and the current status is the same as in [this question from 2012](http://stackoverflow.com/questions/14023423/how-to-preprocess-features-when-some-of-them-are-factors). So using a "hacky" workaround will eventually be the way to go at the moment... – geekoverdose May 17 '16 at 13:47

1 Answers1

1

There's not (currently) a way to do this besides writing a custom model to do so (see the example with PLS and RF near the end).

I'm working on a method to specify which variables get which pre-processing method. However, with dummy variables, this is tough since you might need to specific the names of a lot of predictors whose columns are not in the current dat set. The idea is to be able to use wildcards (e.g. Species* to capture Speciesversicolor and Speciesvirginica) but the code isn't quite there yet.

Max

topepo
  • 13,534
  • 3
  • 39
  • 52
  • Could you please let me know if there already is a way to do so? If not, I would appreciate it if you could fix the problem with the link to custom model. – ebrahimi Nov 09 '22 at 01:08