I have a DataFrame containing 2 columns of ordered categorical data (of the same category). I want to construct another column that contains the categorical maximum of the first 2 columns. I set up the following.
import pandas as pd
from pandas.api.types import CategoricalDtype
import numpy as np
cats = CategoricalDtype(categories=['small', 'normal', 'large'], ordered=True)
data = {
'A': ['normal', 'small', 'normal', 'large', np.nan],
'B': ['small', 'normal', 'large', np.nan, 'small'],
'desired max(A,B)': ['normal', 'normal', 'large', 'large', 'small']
}
df = pd.DataFrame(data).astype(cats)
The columns can be compared, although the np.nan items are problematic, as running the following code shows.
df['A'] > df['B']
The manual suggests that max() works on categorical data, so I try to define my new column as follows.
df[['A', 'B']].max(axis=1)
This yields a column of NaN. Why?