Pandas Cut Categorical Treating Nan as Additional Max Bin

Question

I have a Pandas dataframe where I'm running the max across two binned columns. I'm wanting max to treat nan (which I'm substituting to be 'NA') as the max possible bin. When re-categorizing the dataframe and adding this addtional bin, max isn't properly treating the new NA as the new maxiumum value. I'm not sure if there is some better way to treat blank and NaN values as a seperate max bin when performing max across two binned columns.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'col1':[10, 22, 25],
    'col2':[11,15,np.nan]
})

bins = [-float('inf'),10,20,30,float("inf")]   
labels = ['Tier 1', 'Tier 2', 'Tier 3', 'Tier 4']

print(df)

df['col1'] = pd.cut(pd.to_numeric(df['col1'], errors='coerce'), bins=bins, labels=labels)
df['col1'] = pd.Categorical(df['col1'], categories=['Tier 1', 'Tier 2', 'Tier 3', 'Tier 4', 'NA'], ordered=True)
df['col1'].fillna('NA', inplace=True)    
df['col2'] = pd.cut(pd.to_numeric(df['col2'], errors='coerce'), bins=bins, labels=labels)
df['col2'] = pd.Categorical(df['col2'], categories=['Tier 1', 'Tier 2', 'Tier 3', 'Tier 4', 'NA'], ordered=True)
df['col2'].fillna('NA', inplace=True)    

print(df)

df.max(axis=1)

I like this idea, but Tier 4 is currently being used for all values to infinity. I still want this bin to be outside of the continuous series somehow and just display NA when running max across the row. — Shaun, Nov 03 '21 at 14:49

score 0 · Answer 1 · answered Nov 03 '21 at 16:52

It just appears that when running max across the columns it was not using the categories for priority but was using alphabetical. I just renamed NA to be ZNA and then replaced after it did the merge to go back to NA.

Maybe a future enhancement if running max across two categorical columns with same categories have it assume categories rather than alphabetical.

Pandas Cut Categorical Treating Nan as Additional Max Bin

1 Answers1