2

I have a df with about 50 columns:

Product ID | Cat1 | Cat2 |Cat3 | ... other columns ...
8937456       0      5     10
8497534       25     3     0
8754392       4      15    7 

Cat signifies how many quantities of that product fell into a category. Now I want to add a column "Category" denoting the majority Category for a product (ignoring the other columns and just considering the Cat columns).

df_goal:

Product ID | Cat1 | Cat2 |Cat3 | Category | ... other columns ...
8937456       0      5     10       3
8497534       25     3     0        1
8754392       4      15    7        2

I think I need to use max and apply or map?

I found those on stackoverflow, but they don't not address the category assignment. In Excel I renamed the columns from Cat 1 to 1 and used index(match(max)).

Python Pandas max value of selected columns

How should I take the max of 2 columns in a dataframe and make it another column?

Assign new value in DataFrame column based on group max

Community
  • 1
  • 1
jeangelj
  • 4,338
  • 16
  • 54
  • 98

1 Answers1

4

Here's a NumPy way with numpy.argmax -

df['Category'] = df.values[:,1:].argmax(1)+1

To restrict the selection to those columns, use those column headers/names specifically and then use idxmax and finally replace the string Cat with `empty strings, like so -

df['Category'] = df[['Cat1','Cat2','Cat3']].idxmax(1).str.replace('Cat','')

numpy.argmax or panda's idxmax basically gets us the ID of max element along an axis.

If we know that the column names for the Cat columns start at 1st column and end at 4th one, we can slice the dataframe : df.iloc[:,1:4] instead of df[['Cat1','Cat2','Cat3']].

Divakar
  • 218,885
  • 19
  • 262
  • 358
  • thank you for the quick reply; can I ask, how argmax works? I unfortunately have other columns in the df as well; is there a way to restrict it to those columns or should I save the required columns as its own df? – jeangelj Apr 10 '17 at 18:45
  • Select those columns then use `.values`. – juanpa.arrivillaga Apr 10 '17 at 18:46
  • uhhh! brilliant! testing – jeangelj Apr 10 '17 at 18:47
  • df['Category'] = df[['Cat1','Cat2','Cat3']].idxmax(1).str.replace('Cat','') worked perfectly; amazing one liner; thank you - wish I could give it more than one vote – jeangelj Apr 10 '17 at 19:33