My question is about groupby operation with pandas. I have the following DataFrame :
In [4]: df = pd.DataFrame({"A": range(4), "B": ["PO", "PO", "PA", "PA"], "C": ["Est", "Est", "West", "West"]})
In [5]: df
Out[5]:
A B C
0 0 PO Est
1 1 PO Est
2 2 PA West
3 3 PA West
This is what I would like to do : I want to group by column B and do a sum on column A. But at the end, I would like column C to still be in the DataFrame. If I do :
In [8]: df.groupby(by="B").aggregate(pd.np.sum)
Out[8]:
A
B
PA 5
PO 1
It does the job but column C is missing. I can also do this :
In [9]: df.groupby(by=["B", "C"]).aggregate(pd.np.sum)
Out[9]:
A
B C
PA West 5
PO Est 1
or
In [11]: df.groupby(by=["B", "C"], as_index=False).aggregate(pd.np.sum)
Out[11]:
B C A
0 PA West 5
1 PO Est 1
But in both cases it group by B AND C and not just B and keeps the C value. Is what I want to do irrelevant or is there a way to do it ?