1

I'm fairly familiar with pandas categorical dtype. But, I'm having trouble accessing the nice ordered formatting of the categorical dtype at the bottom of a pandas series frame.

Note: I realize other questions have been asked that just gets the unique names. But, this does not provide formatting of the ordering (Categories (3, object): ['low' < 'medium' < 'high']).

If series is y, I've tried:

y.cat.categories #-> index (but without > ordering) y.cat.categories.to_numpy() --> array y.cat.ordered --> bool

y


Out[288]: 
    0      medium
    1         low
    2      medium
    3        high
    4      medium
            ...  
    437    medium
    438    medium
    439    medium
    440      high
    441       low
    Name: target, Length: 442, dtype: category
    Categories (3, object): ['low' < 'medium' < 'high']    # <<---- Trying to get this info 
                                                           # here programatically!

What I'm trying to get is the last line in the output above.

leeprevost
  • 384
  • 1
  • 3
  • 15

1 Answers1

1

Give df,

scoreDtype = pd.CategoricalDtype(['low', 'medium', 'high'], ordered=True)
df = pd.DataFrame({'score':np.random.choice('low medium high'.split(' '), 50)})

You can get the categories using .cat the category accessor:

df['score'].cat.catories

Output:

Index(['low', 'medium', 'high'], dtype='object')

But, you can get the string representation of this object like this:

df['score'].to_string().rsplit('\n', 1)[-1]

Output:

"Categories (3, object): ['low' < 'medium' < 'high']"
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • 1
    Thank you. I agree that answers the question. I'm surprised I can't access this anyhwere as its in the pandas view. (Categories (3, object): ['low' < 'medium' < 'high']). I also tried df.cat.__repr__() – leeprevost May 17 '22 at 17:12
  • Surprised this doesn't do it: y_q_c.cat.categories.__repr__() Out[55]: "Index(['low', 'medium', 'high'], dtype='object')" – leeprevost May 17 '22 at 17:20
  • Yep, that string representation is part of the Series and not part of pd.Categorical nor pd.CategoricalDtype. I couldn't find it anywhere else. – Scott Boston May 17 '22 at 17:29