6

Given this Dask DataFrame :

Dask DataFrame Structure:
             date  value           symbol
npartitions=2                                
           object  int64  category[known]
...              ...
...              ...
Dask Name: from-delayed, 6 tasks2130

How can I set_index on 'symbol' column (which is category[known)?

df = df.set_index('symbol')
Traceback (most recent call last):
[...]
TypeError: Categorical is not ordered for operation max
you can use .as_ordered() to change the Categorical to an ordered one
jpp
  • 159,742
  • 34
  • 281
  • 339
Ghislain Viguier
  • 342
  • 4
  • 12
  • 1
    the error msg seems clear : "you can use .as_ordered() to change the Categorical to an ordered one" – Mitch Wheat Nov 25 '18 at 04:06
  • Oh god! Yes you right.. I tested so many things... The good function call is : `df['symbol'] = df['symbol'].cat.as_ordered()`. I missed the `cat`. After that `df = df.set_index('symbol')` works correctly. Thanks a lot. – Ghislain Viguier Nov 25 '18 at 10:18
  • Would you like to give this as an answer, so that others can more easily find it? – mdurant Nov 25 '18 at 19:54

1 Answers1

5

Categorical objects must be defined ordered before they can be indexed. The error message tells us to use the as_ordered(). This method comes from the cat structure:

df['symbol'] = df['symbol'].cat.as_ordered()
df = df.set_index('symbol')
Ghislain Viguier
  • 342
  • 4
  • 12