1

Suppose you have the below data frame called raw:

                    A   B   C
Index           
2007-01-03  NaN        -1   1
2007-01-04  0.021954    1   2
2007-01-05  -0.007147   -1  3
2007-01-08  0.004926    1   4
2007-01-09  0.079800    1   4
2007-01-10  0.046746    1   4
2007-01-11  -0.012448   -1  5
2007-01-12  -0.012394   -1  5
2007-01-16  0.025873    1   6
2007-01-17  -0.022391   -1  7
2007-01-18  -0.063928   -1  7
2007-01-19  -0.006420   -1  7
2007-01-22  -0.019511   -1  7
2007-01-23  -0.012639   -1  7
2007-01-24  0.011601    1   8
2007-01-25  -0.005204   -1  9
2007-01-26  -0.010138   -1  9
2007-01-29  0.006538    1   10
2007-01-30  -0.004549   -1  11
2007-01-31  0.002102    1   12

How can I groupby columns B & the count of column C while also summing column A? Am able to produce the former via raw[['B','C']].value_counts()(grouping by column B and count of C) but unable to combine this result with the latter (sum of column A).

raw[['B','C']].value_counts()

B   C 
-1  7     5
 1  4     3
-1  5     2
    9     2
    1     1
    3     1
    11    1
 1  2     1
    6     1
    8     1
    10    1
    12    1
dtype: int64
jgg
  • 791
  • 4
  • 17
  • Does this answer your question? [Pandas: Group by two columns to get sum of another column](https://stackoverflow.com/questions/40553002/pandas-group-by-two-columns-to-get-sum-of-another-column) – PacketLoss Mar 11 '21 at 01:47

1 Answers1

2

I think you want to group by B,C:

df.groupby(['B','C']).agg({'C':'count', 'A':'sum'})

Output:

       C         A
B  C              
-1 1   1  0.000000
   3   1 -0.007147
   5   2 -0.024842
   7   5 -0.124889
   9   2 -0.015342
   11  1 -0.004549
 1 2   1  0.021954
   4   3  0.131472
   6   1  0.025873
   8   1  0.011601
   10  1  0.006538
   12  1  0.002102

Or better yet with named agg, which allows you to rename the new columns:

df.groupby(['B','C']).agg(C_count=('C','count'),
                          A_sum=('A','sum'))

Output:

       C_count     A_sum
B  C                    
-1 1         1  0.000000
   3         1 -0.007147
   5         2 -0.024842
   7         5 -0.124889
   9         2 -0.015342
   11        1 -0.004549
 1 2         1  0.021954
   4         3  0.131472
   6         1  0.025873
   8         1  0.011601
   10        1  0.006538
   12        1  0.002102
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Thanks. This is Close. I actually want to group by the count of column C so all of the values with various counts are grouped together. Maybe the solution requires one additional step to sort by the C_count column you created? – jgg Mar 11 '21 at 01:53
  • 1
    @jgg chain with `.sort_values(['B','C_count'])` in the second solution. – Quang Hoang Mar 11 '21 at 01:55