I'm trying to understand the pros/cons and when to use the various encoding options that are available to me in h2o with the parameter 'categorical_encoding'.
It would be helpful if people could point out general rules of thumb on how to use this.
Typically I use the 'Enum' value because I like how all categorical values are grouped together when looking at feature importance. On the other hand, xgboost's default value is 'label-encoder' I believe, which breaks things up by categorical level/value.
Unfortunately, I don't really know where to begin or questions to ask around these other values available:
- one hot internal
- one hot explicit
- sort_by_response
- enum_limited
- enum -label-encoder
Again, I primarily stick with enum, sometimes label-encoder, but honestly I don't know practical implications of these various options. Would love a generalized understanding of when one might be better than other from someone knowledgeable !