13

I have a pandas dataframe that looks like this

ID     country   month   revenue  profit   ebit
234    USA       201409   10        5       3
344    USA       201409    9        7       2
532    UK        201410    20       10      5
129    Canada    201411    15       10      5

I want to group by ID, country, month and count the IDs per month and country and sum the revenue, profit, ebit. The output for the above data would be:

 country   month    revenue   profit  ebit   count
   USA     201409     19        12      5      2
   UK      201409     20        10      5      1
   Canada  201411     15        10      5      1

I have tried different variations of groupby, sum and count functions of pandas but I am unable to figure out how to apply groupby sum and count all together to give the result as shown. Please share any ideas that you might have. Thanks!

N91
  • 395
  • 1
  • 3
  • 14

4 Answers4

27

It can be done using pivot_table this way:

>>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)
>>> df1 
                ebit  profit  revenue
country month                        
Canada  201411     5      10       15
UK      201410     5      10       20
USA     201409     5      12       19

>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')
>>> df2

country  month 
Canada   201411    1
UK       201410    1
USA      201409    2

>>> pd.concat([df1,df2],axis=1)

                ebit  profit  revenue  count
country month                               
Canada  201411     5      10       15      1
UK      201410     5      10       20      1
USA     201409     5      12       19      2

UPDATE

It can be done in one-line using pivot_table and providing a dict of functions to apply to each column in the aggfunc argument:

pd.pivot_table(
   df,
   index=['country','month'],
   aggfunc={'revenue': np.sum, 'profit': np.sum, 'ebit': np.sum, 'ID': len}
).rename(columns={'ID': 'count'})

                count  ebit  profit  revenue
country month                               
Canada  201411      1     5      10       15
UK      201410      1     5      10       20
USA     201409      2     5      12       19
Mabel Villalba
  • 2,538
  • 8
  • 19
  • 1
    would you mind looking into this too please? It just got more tricky! https://stackoverflow.com/questions/48785833/python-aggregation-on-time-series – N91 Feb 16 '18 at 12:12
19

You can do the groupby, and then map the counts of each country to a new column.

g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum().reset_index()
g['count'] = g['country'].map(df['country'].value_counts())
g

Out[3]:


    country  month   revenue  profit  ebit  count
0   Canada   201411  15       10      5     1
1   UK       201410  20       10      5     1
2   USA      201409  19       12      5     2

Edit

To get the counts per country and month, you can do another groupby, and then join the two DataFrames together.

g = df.groupby(['country', 'month'])['revenue', 'profit', 'ebit'].sum()
j = df.groupby(['country', 'month']).size().to_frame('count')
pd.merge(g, j, left_index=True, right_index=True).reset_index()

Out[6]:

    country  month   revenue  profit  ebit  count
0   Canada   201411  15       10      5     1
1   UK       201410  20       10      5     1
2   UK       201411  10       5       2     1
3   USA      201409  19       12      5     2

I added another record for the UK with a different date - notice how there are now two UK entries in the merged DataFrame, with the appropriate counts.

Ben
  • 856
  • 5
  • 9
  • Thanks for your help Ben. But this solution is not taking into account the month. I need a count value for all the IDs against a each unique combination of country and month. – N91 Feb 13 '18 at 15:20
  • great! Thank you! – N91 Feb 13 '18 at 16:24
6

The following solution seems the simplest.

Group by country and month:

grouped_df = df.groupby(['country', 'month'])

Apply sum to columns of interest (revenue, profit, ebit):

final = grouped_df[['revenue', 'profit', 'ebit']].agg('sum')

Assign the size of the grouped_df to a new column in 'final':

final['count'] = grouped_df.size()
print(final)

Out[256]: 
                revenue  profit  ebit  count
country month                               
Canada  201411       15      10     5      1
UK      201410       20      10     5      1
USA     201409       19      12     5      2

All done!

matpav
  • 61
  • 1
  • 4
3

The following groupby solution perhaps is the simplest and exactly what you want.

temp2 = temp1.groupby(['country', 'month'])
        .agg({'revenue': 'sum', 'profit': 'sum', 'ebit': 'sum', 'ID': 'count'})
        .reset_index()
        .rename(columns={'ID': 'count'})
Brad667
  • 41
  • 2