0

I have a Pandas dataframe where I'm running the max across two binned columns. I'm wanting max to treat nan (which I'm substituting to be 'NA') as the max possible bin. When re-categorizing the dataframe and adding this addtional bin, max isn't properly treating the new NA as the new maxiumum value. I'm not sure if there is some better way to treat blank and NaN values as a seperate max bin when performing max across two binned columns.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'col1':[10, 22, 25],
    'col2':[11,15,np.nan]
})

bins = [-float('inf'),10,20,30,float("inf")]   
labels = ['Tier 1', 'Tier 2', 'Tier 3', 'Tier 4']

print(df)

df['col1'] = pd.cut(pd.to_numeric(df['col1'], errors='coerce'), bins=bins, labels=labels)
df['col1'] = pd.Categorical(df['col1'], categories=['Tier 1', 'Tier 2', 'Tier 3', 'Tier 4', 'NA'], ordered=True)
df['col1'].fillna('NA', inplace=True)    
df['col2'] = pd.cut(pd.to_numeric(df['col2'], errors='coerce'), bins=bins, labels=labels)
df['col2'] = pd.Categorical(df['col2'], categories=['Tier 1', 'Tier 2', 'Tier 3', 'Tier 4', 'NA'], ordered=True)
df['col2'].fillna('NA', inplace=True)    

print(df)

df.max(axis=1)

enter image description here

Shaun
  • 81
  • 5

1 Answers1

0

It just appears that when running max across the columns it was not using the categories for priority but was using alphabetical. I just renamed NA to be ZNA and then replaced after it did the merge to go back to NA.

Maybe a future enhancement if running max across two categorical columns with same categories have it assume categories rather than alphabetical.

Shaun
  • 81
  • 5