12

I have a pandas DataFrame where some pairs of rows have the same ID but different name. What I want is to reduce the row pair to one row, and display both of their names.

INPUT:

ID   NAME     AGE
149  Bob      32
150  Tom      53
150  Roberts  53
151  Pamela   28
152  Andrew   23

OUTPUT:

ID   NAME        AGE
149  Bob         32
150  Tom Roberts 53
151  Pamela      58
152  Andrew      23

Otherwise, I can also do ['Tom', 'Roberts'], or any other method that still captures the data.

cs95
  • 379,657
  • 97
  • 704
  • 746
Landmaster
  • 1,043
  • 2
  • 13
  • 21

1 Answers1

20

Easily done with groupby.

df.groupby('ID', as_index=False).agg({'NAME' : ' '.join, 'AGE' : 'first'})  

ID          NAME  AGE
149          Bob   32
150  Tom Roberts   53
151       Pamela   28
152       Andrew   23
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Thank you! Now, what if I only want to aggregate the name, as opposed to another column (updated question)? – Landmaster Oct 08 '17 at 21:12
  • 1
    @Landmaster Sorry, you can use `df[['ID', 'NAME']].groupby('ID', as_index=False).agg(' '.join)` it's very similar. – cs95 Oct 08 '17 at 21:14
  • @COLDSPEED how would you assign it to the original frame such that your output is as described above? Right now, it's not giving me the age column. – Landmaster Oct 08 '17 at 21:16
  • 2
    Seriously, I can't come up with an alternative that is even close to this good. – piRSquared Oct 09 '17 at 06:00