Pandas Group By And Get Dummies

Question

I want to make get dummy variables per unique value. Idea is to turn the data frame into a multi-label target. How can I do it?

Data:

           ID                      L2
           A                 Firewall
           A                 Security
           B           Communications
           C                 Business
           C                 Switches

Desired Output:

ID   Firewall  Security  Communications  Business   Switches
 A      1          1             0              0         0
 B      0          0             1              0         0
 C      0          0             0              1         1

I have tried pd.pivot_table but it requires a column to aggregate on. I have also tried answer on this link but it sums the values rather than just turning into binary dummy columns. I would much appreciate your help. Thanks a lot!

score 8 · Answer 1 · answered Aug 28 '20 at 15:42

crosstab, then convert to boolean:

pd.crosstab(df['ID'],df['L2']).astype(bool)

Output:

L2  Business  Communications  Firewall  Security  Switches
ID                                                        
A      False           False      True      True     False
B      False            True     False     False     False
C       True           False     False     False      True

BENY · Accepted Answer · 2020-08-28T15:47:02.043

4

Let us set_index then get_dummies, since we have multiple duplicate in each ID ,we need to sum with level = 0

s = df.set_index('ID')['L2'].str.get_dummies().max(level=0).reset_index()
Out[175]: 
  ID  Business  Communications  Firewall  Security  Switches
0  A         0               0         1         1         0
1  B         0               1         0         0         0
2  C         1               0         0         0         1

edited Aug 28 '20 at 15:47

answered Aug 28 '20 at 15:34

BENY

317,841
20
164
234

Thanks for the answer but I have a bigger data frame and using `sum` is actually `summing` the values so I'm not getting just binary columns. – Krishnang K Dalal Aug 28 '20 at 15:42
@KrishnangKDalal use `max` instead of `sum` – Quang Hoang Aug 28 '20 at 15:44
There, it is!! Thanks a lot. Also, thank you everyone for your help – Krishnang K Dalal Aug 28 '20 at 15:47
Thanks for the answer, why using the max function? – Utopia Nov 02 '20 at 16:57
1

@Utopia multiple same column with 0 and 1 we only want return 1 any of them is 1 , that is why use max – BENY Nov 02 '20 at 17:00

Soumendra Mishra · Answer 3 · 2020-08-28T16:03:59.690

You can try this:

df1 = pd.read_csv("file.csv")
df2 = df1.groupby(['ID'])['L2'].apply(','.join).reset_index()
df3 = df2["L2"].str.get_dummies(",")
df = pd.concat([df2, df3], axis = 1)
print(df)

Output:

  ID                 L2  Business  Communications  Firewall  Security  Switches
0  A  Firewall,Security         0               0         1         1         0
1  B     Communications         0               1         0         0         0
2  C  Business,Switches         1               0         0         0         1

Alternative Option:

df = df.groupby(['ID'])['L2'].apply(','.join).str.get_dummies(",").reset_index()
print(df)

Ben.T · Answer 4 · 2020-08-28T15:48:08.420

you can use pivot_table if you change the aggfunc=any.

print(df.pivot_table(index='ID', columns='L2', 
                     aggfunc=any, fill_value=False)\
        .astype(int))
L2  Business  Communications  Firewall  Security  Switches
ID                                                        
A          0               0         1         1         0
B          0               1         0         0         0
C          1               0         0         0         1

and maybe reset_index at the end to put the ID as column

Pandas Group By And Get Dummies

4 Answers4

Linked

Related