-2

Could someone help ?

I am using the dummy package in R (function dummy) to convert a categorical variable(10 categories) into dummy variables because some of the algorithms I am using (adaboost and rotation forest), don't handle categorical variables well.

After using the package I get 10 dummy variables but they are factors. I expected them to be numeric with 1s and 0s.

Should I convert them to numeric ? or use them as factors.

thanks a lot !!!! all the best Pedro

Pepe
  • 1
  • 6
  • What function of the package are you using? if it's function `dummy` see argument `int` in the help page. – Rui Barradas Aug 20 '17 at 13:57
  • If you should convert them depends on: a) the technically required input of the functions you plan to use (adaboost and rotation forest) b) some functions handle factors and numeric values differntly. so you have to make sure you're not creating problems by casting factors to numeric values. – Jan Aug 20 '17 at 14:06
  • thank you. I have used the function dummy. Already added that information to the question. – Pepe Aug 20 '17 at 14:22
  • Thank you Rui, I saw that. But which option should I chose ? does it depend on what those algorithms ask for ? – Pepe Aug 20 '17 at 14:25
  • Yes it does. If you're using function `adaboost` from package `fastAdaboost` then the response should be a `factor`. – Rui Barradas Aug 20 '17 at 14:59

1 Answers1

0

After performing one hot encoding there is no difference keeping them as factor or numeric . Its better not to perform one hot encoding for Tree based models.It will decrease performance.Here is an article describing effect of one hotted variables..It better to pass the categorical variables by converting them into factors

RAVI TEJA M
  • 151
  • 4