I have a Pandas dataframe where I'm running the max across two binned columns. I'm wanting max to treat nan (which I'm substituting to be 'NA') as the max possible bin. When re-categorizing the dataframe and adding this addtional bin, max isn't properly treating the new NA as the new maxiumum value. I'm not sure if there is some better way to treat blank and NaN values as a seperate max bin when performing max across two binned columns.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'col1':[10, 22, 25],
'col2':[11,15,np.nan]
})
bins = [-float('inf'),10,20,30,float("inf")]
labels = ['Tier 1', 'Tier 2', 'Tier 3', 'Tier 4']
print(df)
df['col1'] = pd.cut(pd.to_numeric(df['col1'], errors='coerce'), bins=bins, labels=labels)
df['col1'] = pd.Categorical(df['col1'], categories=['Tier 1', 'Tier 2', 'Tier 3', 'Tier 4', 'NA'], ordered=True)
df['col1'].fillna('NA', inplace=True)
df['col2'] = pd.cut(pd.to_numeric(df['col2'], errors='coerce'), bins=bins, labels=labels)
df['col2'] = pd.Categorical(df['col2'], categories=['Tier 1', 'Tier 2', 'Tier 3', 'Tier 4', 'NA'], ordered=True)
df['col2'].fillna('NA', inplace=True)
print(df)
df.max(axis=1)