Aggregating in panda dataframe

Question

I have the below panda dataframe

Now I want to sum the count for [user 1][user 2] and [user 2][user 1]. For example, the count value for [user 1 = 1][user 2 = 92] is count = 1 and [user 1 = 92][user 2 = 1] is count = 2, so I want the final output as user 1 = 1, user 2 = 92 and count = 3. Also, there might not be [user 2][user 1] for every [user 1][user 2]. For example, for [user 1 = 29][user 2 = 56] count value is 3 but there is no [user 1 = 56][user 2 = 29], so the final output would be user 1 = 29, user 2 = 56 and count = 3.

So final output should look like this

Can someone please guide me how to do this?

Please post actual data rather than pictures. – cs95 Jul 08 '17 at 09:06 — cs95, Jul 08 '17 at 09:06

Adrien Matissart · Accepted Answer · 2017-07-08T09:30:44.763

1

A solution grouping sorted tuples:

df['users'] = list(zip(df.user1,df.user2))
df['users'] = df.users.apply(lambda t:tuple(sorted(t)))
counts = df.groupby('users', as_index=False)['count'].sum()
counts[['user1','user2']] = counts.users.apply(pd.Series)
counts = counts[['user1','user2','count']] # Select and reorder columns

edited Jul 08 '17 at 09:30

answered Jul 08 '17 at 09:24

Adrien Matissart

1,610
15
19

Thanks, it worked great. Can you please explain the code a bit? – aries Jul 08 '17 at 10:04
1

Sure. First we build a column `users` that contains unique (sorted) pairs of users. A tuple is required (instead of a list) because we need hashable values to use `groupby`. Then we group our tuples and sum the `count` column. `as_index=False` is useful here to keep the tuples in a separate column and process it later. Finally we apply `pd.Series` to split tuples into 2 separate columns again. – Adrien Matissart Jul 08 '17 at 10:11

Aggregating in panda dataframe

1 Answers1

Linked