Create a new dataframe by aggregating repeated origin and destination values by a separate count column in a pandas dataframe

Question

I am having trouble analysing origin-destination values in a pandas dataframe which contains origin/destination columns and a count column of the frequency of these. I want to transform this into a dataframe with the count of how many are leaving and entering:

Initial:

    Origin  Destination  Count

    A       B            7
    A       C            1
    B       A            1
    B       C            4
    C       A            3
    C       B            10

For example this simplified dataframe has 7 leaving from A to B and 1 from A to C so overall leaving place A would be 8, and entering place A would be 4 (B - A is 1, C - A is 3) etc. The new dataframe would look something like this.

Goal:

   Place   Entering  Leaving

   A       4         8
   B       17        5
   C       5         13

I have tried several techniques such as .groupby() but have not yet created my intended dataframe. How can I handle the repeated values in the origin/destination columns and assign them to a new dataframe with aggregated values of just the count of leaving and entering?

Thank you!

jezrael · Accepted Answer · 2017-11-04T14:47:03.147

1

Use double groupby + concat:

a = df.groupby('Destination')['Count'].sum()
b = df.groupby('Origin')['Count'].sum()

df = pd.concat([a,b], axis=1, keys=('Entering','Leaving')).rename_axis('Place').reset_index()
print (df)
  Place  Entering  Leaving
0     A         4        8
1     B        17        5
2     C         5       13

edited Nov 04 '17 at 14:47

answered Nov 04 '17 at 14:38

jezrael

822,522
95
1,334
1,252

1

Dual groupby is always used to be my thing : ) – Bharath M Shetty Nov 04 '17 at 15:09
@Bharath - Thanks. – jezrael Nov 04 '17 at 15:10
Ah so groupby then **concat** is the way to go - thank you very much! :) – Geo_Py Nov 04 '17 at 15:13

score 0 · Answer 2 · answered Nov 04 '17 at 15:16

0

pivot_table then do sum

df=pd.pivot_table(df,index='Origin',columns='Destination',values='Count',aggfunc=sum)
pd.concat([df.sum(0),df.sum(1)],1)

Out[428]: 
      0     1
A   4.0   8.0
B  17.0   5.0
C   5.0  13.0

answered Nov 04 '17 at 15:16

BENY

317,841
20
164
234

Create a new dataframe by aggregating repeated origin and destination values by a separate count column in a pandas dataframe

2 Answers2