How to do a pandas groupby operation on one column but keep the other in the resulting dataframe

Question

My question is about groupby operation with pandas. I have the following DataFrame :

In [4]: df = pd.DataFrame({"A": range(4), "B": ["PO", "PO", "PA", "PA"], "C": ["Est", "Est", "West", "West"]})

In [5]: df
Out[5]: 
   A   B     C
0  0  PO   Est
1  1  PO   Est
2  2  PA  West
3  3  PA  West

This is what I would like to do : I want to group by column B and do a sum on column A. But at the end, I would like column C to still be in the DataFrame. If I do :

In [8]: df.groupby(by="B").aggregate(pd.np.sum)
Out[8]: 
    A
B    
PA  5
PO  1

It does the job but column C is missing. I can also do this :

In [9]: df.groupby(by=["B", "C"]).aggregate(pd.np.sum)
Out[9]: 
         A
B  C      
PA West  5
PO Est   1

or

In [11]: df.groupby(by=["B", "C"], as_index=False).aggregate(pd.np.sum)
Out[11]: 
    B     C  A
0  PA  West  5
1  PO   Est  1

But in both cases it group by B AND C and not just B and keeps the C value. Is what I want to do irrelevant or is there a way to do it ?

MaxU - stand with Ukraine · Accepted Answer · 2016-11-03T08:55:36.437

33

try to use DataFrameGroupBy.agg() method with dict of {column -> function}:

In [6]: df.groupby('B').agg({'A':'sum', 'C':'first'})
Out[6]:
       C  A
B
PA  West  5
PO   Est  1

From docs:

Function to use for aggregating groups. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. If passed a dict, the keys must be DataFrame column names.

or something like this depending on your goals:

In [8]: df = pd.DataFrame({"A": range(4), "B": ["PO", "PO", "PA", "PA"], "C": ["Est1", "Est2", "West1", "West2"]})

In [9]: df.groupby('B').agg({'A':'sum', 'C':'first'})
Out[9]:
        C  A
B
PA  West1  5
PO   Est1  1

In [10]: df['sum_A'] = df.groupby('B')['A'].transform('sum')

In [11]: df
Out[11]:
   A   B      C  sum_A
0  0  PO   Est1      1
1  1  PO   Est2      1
2  2  PA  West1      5
3  3  PA  West2      5

edited Nov 03 '16 at 08:55

answered Nov 03 '16 at 08:47

MaxU - stand with Ukraine

205,989
36
386
419

This works if the `C` value is the same over all values that are being grouped. Otherwise a `merge` would do the job. – Khris Nov 03 '16 at 08:50
1

@Khris, thank you for your hint! I've added an alternative solution which uses `.transform()` method – MaxU - stand with Ukraine Nov 03 '16 at 08:54
Wonderful !! Thanks a lot. – Ger Nov 03 '16 at 08:55
I am not familiar with but maybe this question could ends in the python/pandas wiki or documentation of stackoverflow ? – Ger Nov 03 '16 at 09:05
@Ger, i think it's documented pretty well [here](http://pandas.pydata.org/pandas-docs/stable/groupby.html#transformation) – MaxU - stand with Ukraine Nov 03 '16 at 09:53
1

@MaxU: Thank you for reminding me of the `transform` function, I have overlooked that so far and solved problems with cumbersome merging instead. – Khris Nov 03 '16 at 10:11
you can also use any arbitrary function instead of a magic string `sum` etc – Ufos Aug 30 '18 at 17:24
I just merged back the grouped to the original dataframe, selected the rows with identical values in the duplicate groupby columns, then deleted columns and did some clean-up. – Rockbar Oct 10 '18 at 08:51
Does this dict method supports something like "apply('/'.join)" as a value? got a `AttributeError` – steven Jun 03 '19 at 13:56
@steven, yes, for the example from the answer: `df.groupby('B').agg({'C':'/'.join})` – MaxU - stand with Ukraine Jun 03 '19 at 14:00
@MaxU Thanks a lot. Is there any resource that you can share that explains how `agg` works and pandas doesn't seem to have any doc for this. – steven Jun 03 '19 at 15:01
what if I need the mean for more than one column? I tried to pass a list but it doesn't seem working. – steven Jun 10 '19 at 14:16
@steven, why not to open a __new question__ with a small sample input data set and your desired data set? – MaxU - stand with Ukraine Jun 10 '19 at 14:18

How to do a pandas groupby operation on one column but keep the other in the resulting dataframe

1 Answers1