2

df:

category    A   B   C   D
x   0   1   0   0
y   1   0   0   0
z   1   0   0   0
l   0   0   0   1
m   0   1   0   0
n   0   0   1   0

how to get df like below

Category    Sub-category
x   B
y   A
z   A
l   D
m   B
n   C

I tried:

df['sector'] = df.apply(lambda x: df.columns[x.argmax()], axis = 1)

but getting TypeError: ("reduction operation 'argmax' not allowed for this dtype", 'occurred at index 1')

rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • This is not a code writing service. Show an attempt and share where you are having specific problems. You should also tag your post with panda tag – Alan Jul 10 '18 at 23:47
  • `df['sub_category']=df.iloc[:,1:].astype(bool).dot(df.columns[1:])` – Pyd Jul 11 '18 at 05:04

1 Answers1

1

Just do

df['sub_category'] = df[['A', 'B', 'C', 'D']].idxmax(axis=1)


    category    A   B   C   D   sub_category
0   x           0   1   0   0   B
1   y           1   0   0   0   A
2   z           1   0   0   0   A
3   l           0   0   0   1   D
4   m           0   1   0   0   B
5   n           0   0   1   0   C

Of course you may select only the columns you want

df[['category', 'sub_category']]

    category    sub_category
0   x           B
1   y           A
2   z           A
3   l           D
4   m           B
5   n           C
rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • Thanks Rafael! can't we use function here and use apply, suppose our column list is bigger in size – Vikash Kumar Jul 11 '18 at 01:05
  • @VikashKumar You can use `apply` but performance sucks. Im pretty sure it wouldn't be better than `idxmax` – rafaelc Jul 11 '18 at 01:06
  • I'm sure `apply` wouldn't be faster, but there is an even faster method than `idxmax` for a large dataframe using `np.where`. See [my answer](https://stackoverflow.com/a/51275990/6671176) to this [duplicate question](https://stackoverflow.com/questions/26762100/reconstruct-a-categorical-variable-from-dummies-in-pandas) for benchmarks – sacuL Jul 11 '18 at 01:40
  • 1
    @sacul excellent answer :) – rafaelc Jul 11 '18 at 01:41