Extract max of a multiindex pandas dataframe with strings and NaN

Question

I've got the following multiindex dataframe:

first              bar                 baz                 foo          
second             one       two       one       two       one       two
first second                                                            
bar   one          NaN -0.056213  0.988634  0.103149    1.5858 -0.101334
      two     -0.47464 -0.010561  2.679586 -0.080154       <LQ -0.422063
baz   one          <LQ  0.220080  1.495349  0.302883 -0.205234  0.781887
      two     0.638597  0.276678 -0.408217 -0.083598  -1.15187 -1.724097
foo   one     0.275549 -1.088070  0.259929 -0.782472   -1.1825 -1.346999
      two     0.857858  0.783795 -0.655590 -1.969776 -0.964557 -0.220568

I would like to to extract the max along one level. Expected result:

first        bar       baz       foo          
second                                                            
one     0.275549  1.495349    1.5858
two     0.857858  2.679586 -0.964557

Here is what I tried:

df.xs('one', level=1, axis = 1).max(axis=0, level=1, skipna = True, numeric_only = False)

And the obtained result:

first        baz
second          
one     1.495349
two     2.679586

How do I get Pandas to not ignore the whole column if one cell contains a string?

(created like this:)

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6])

df['bar','one'].loc['bar','one'] = np.NaN
df['bar','one'].loc['baz','one'] = '<LQ'
df['foo','one'].loc['bar','two'] = '<LQ'

Your column and index have the same names, so it's pretty confusing. — Quang Hoang, Oct 02 '19 at 14:58

score 1 · Accepted Answer · answered Oct 02 '19 at 15:14

I guess you would need to replace the non-numeric with na:

(df.xs('one', level=1, axis=1)
   .apply(pd.to_numeric, errors='coerce')
   .max(level=1,skipna=True)
)

Output (with np.random.seed(1)):

first        bar       baz       foo
second                              
one     0.900856  1.133769  0.865408
two     1.744812  0.319039  0.901591

Extract max of a multiindex pandas dataframe with strings and NaN

1 Answers1