I have a categorical variable with known levels (e.g. hour
that just contains values between 0 and 23), but not all of them are available right now (say, we have measurements from between 0 and 11 o'clock, while hours from 12 to 23 are not covered), though other values are going to be added later. If we naively use pandas.get_dummies()
to map values to indicator variables, we will end up with only 12 of them instead of 24. Is there a way to map values of the categorical variable to a predefined list of dummy variables?
Here's an example of expected behaviour:
possible_values = range(24)
hours = get_dummies_on_steroids(df['hour'], prefix='hour', levels=possible_values)