I am trying to create a model to predict whether or not someone is at risk of a stroke. My data contains some "object" variables that could easily be coded to 0 and 1 (like sex). However, I have some object variables with 4+ categories (e.g. type of job).
I'm trying to encode these objects into integers so that my models can ingest them. I've come across two methods to do so:
- Create dummy variables for each feature, which creates more columns and encodes them as 0 and 1
- Convert the object into an integer using LabelEncoder, which assigns values to each category like 0, 1, 2, 3, and so on within the same column.
Is there a difference between these two methods? If so, what is the recommended best path forward?