Use a category when there is lots of repetition that you expect to exploit.
For example, suppose I want the aggregate size per exchange for a large table of trades. Using the default object
is totally reasonable:
In [6]: %timeit trades.groupby('exch')['size'].sum()
1000 loops, best of 3: 1.25 ms per loop
But since the list of possible exchanges is pretty small, and because there is lots of repetition, I could make this faster by using a category
:
In [7]: trades['exch'] = trades['exch'].astype('category')
In [8]: %timeit trades.groupby('exch')['size'].sum()
1000 loops, best of 3: 702 µs per loop
Note that categories are really a form of dynamic enumeration. They are most useful if the range of possible values is fixed and finite.