remove duplicates while adding a column in csv file using python

Question

I have a CSV file that looks like this:

|innings |     bowler    |
|--------|---------------|                      
|1       |      P Kumar  |
|1       |      P Kumar  |
|1       |      P Kumar  |
|1       |      P Kumar  |
|1       |      Z Khan   |
|1       |      Z Khan   |
|1       |      Z Khan   |
|2       |      AB Dinda |
|2       |      AB Dinda |
|2       |      I Sharma |

Desired Output

|innings |     bowler           |
|--------|----------------------|
|1       |    P Kumar,Z Khan    |
|2       |    AB Dinda,I Sharma |

Code I Applied:

df.groupby(['innings']).bowler.sum().drop_duplicates(subset="bowler",keep='first',inplace=True)

but for some reason, it is giving me an error TypeError: drop_duplicates() got an unexpected keyword argument 'subset'

then i tried without subset: drop_duplicates("bowler",keep='first', inplace=True) now i am getting this error TypeError: drop_duplicates() got multiple values for argument 'keep'

jezrael · Answer 1 · 2021-04-21T08:00:26.797

0

Use DataFrame.drop_duplicates first by both columns and then aggregate join:

df = (df.drop_duplicates(subset=["bowler",'innings'])
        .groupby('innings')
        .bowler.agg(','.join)
        .reset_index())

print (df)
   innings             bowler
0        1     P Kumar,Z Khan
1        2  AB Dinda,I Sharma

edited Apr 21 '21 at 08:00

answered Apr 21 '21 at 07:41

jezrael

822,522
95
1,334
1,252

thanks a lot, it works!! one little question, i hope you dont mind... what if this data set has one more index 'venue' and i want to sort bowler according to innings as well as venue – Vanshika Saini Apr 21 '21 at 10:06
@VanshikaSaini - Do you think change `.bowler` to `['bowler','venue']` ? – jezrael Apr 21 '21 at 10:07

remove duplicates while adding a column in csv file using python

1 Answers1