How can I group_by and sum values in a data frame?

Question

I have this data frame (please refer the figure below)

| State        | County        |  Homicides
|--------------|---------------|-----------
|   Ags        |  Calvillo     |    4
|  Mexico City |  Alvaro O     |    2
|  Mexico City | Alvaro O      |    3
|  Mexico City |  Miguel H     |    2
|   Gto        |   Leon        |    1
|   Gto        |   Leon        |    1

What I want to do is group by County and sum the value of homicides. for example

| State        | County        |  Homicides
|--------------|---------------|-----------
|   Ags        |  Calvillo     |    4
|  Mexico City |  Alvaro O     |    5
| Mexico City  |  Miguel H     |    2
|   Gto        |   Leon        |    2

As you can see I summarize the values of homicides with the same county name

This was my attempt

df1 >> group_by("County") >> summarize(County = X.County)

But is not doing what I Want, can someone guide me with this question, please.

Thanks

Thank you! But what about with my column State, this answer is grouping by county but gives me just the column "County" and "Homicides" how can I recover my column "State"? — coding, Aug 26 '20 at 22:53
df.groupby(["State",'County']).agg('sum'), Yes I tried this but is only giving me one unique state in the column "State" and if I have one state with different county I want to keep the state with the county not just one unique state — coding, Aug 26 '20 at 23:05

score 0 · Answer 1 · answered Aug 26 '20 at 23:16

With your help this was my last line of codes that help me with this issue

df1 = df1.groupby(['State',"County"]).agg('sum')
df1 =df1.reset_index()
df1

This was my result

| State        | County        |  Homicides
|--------------|---------------|-----------
|   Ags        |  Calvillo     |    4
|  Mexico City |  Alvaro O     |    5
| Mexico City  |  Miguel H     |    2
|   Gto        |   Leon        |    2

score 0 · Answer 2 · answered Apr 27 '23 at 06:56

I had a same struggle to do group_by & summation with dfply. Description of "dfply" module for using group_by function in https://github.com/kieferk/dfply/blob/master/README.md is not enough, I think.

Try below.

df1 >> group_by(X.Country) >> summarize(sum_Country = X.County.sum())

How can I group_by and sum values in a data frame?

2 Answers2