-1

I need to get my data in a specific representation for a multi label classification.

I have a df such that :

Key Name             Description               Label
1   Self service    We want self service.      Performance
1   Self service    We want self service.      Storage
2   Multi cloud     Mutli cloud is needed.     Scaling
3   Storage issues  Storage upgrade.           Storage

I want to transform it to :

Key  Name             Description              Performance   Storage   Scaling
1    Self service    We want self service.         1            1         0
2    Multi cloud     Mutli cloud is needed.        0            0         1
3    Storage issues  Storage upgrade.              0            1         0

I have tried things with groupby, pivot and merge but can't get to a workable solution.

Any tricks that could help ?

I tried to pd.get_dummies and groupby but unable to combine them.

ayukum
  • 11
  • 1

1 Answers1

0

You can use pivot_table:

out = (df.pivot_table(index=['Key Name', 'Description'],
                      columns='Label', aggfunc='size', fill_value=0)
         .rename_axis(columns=None).reset_index())
print(out)

# Output
         Key Name             Description  Performance  Scaling  Storage
0     Multi cloud  Mutli cloud is needed.            0        1        0
1    Self service   We want self service.            1        0        1
2  Storage issues        Storage upgrade.            0        0        1
Corralien
  • 109,409
  • 8
  • 28
  • 52