0

I have this data frame (please refer the figure below)

| State        | County        |  Homicides
|--------------|---------------|-----------
|   Ags        |  Calvillo     |    4
|  Mexico City |  Alvaro O     |    2
|  Mexico City | Alvaro O      |    3
|  Mexico City |  Miguel H     |    2
|   Gto        |   Leon        |    1
|   Gto        |   Leon        |    1

What I want to do is group by County and sum the value of homicides. for example

| State        | County        |  Homicides
|--------------|---------------|-----------
|   Ags        |  Calvillo     |    4
|  Mexico City |  Alvaro O     |    5
| Mexico City  |  Miguel H     |    2
|   Gto        |   Leon        |    2

As you can see I summarize the values of homicides with the same county name

This was my attempt

df1 >> group_by("County") >> summarize(County = X.County)

But is not doing what I Want, can someone guide me with this question, please.

Thanks

Machavity
  • 30,841
  • 27
  • 92
  • 100
coding
  • 917
  • 2
  • 12
  • 25
  • 1
    `df.group_by("County").agg('sum')` – inspectorG4dget Aug 26 '20 at 22:47
  • Thank you! But what about with my column State, this answer is grouping by county but gives me just the column "County" and "Homicides" how can I recover my column "State"? – coding Aug 26 '20 at 22:53
  • 1
    You'll have to add it to the `group_by` – inspectorG4dget Aug 26 '20 at 23:00
  • df.groupby(["State",'County']).agg('sum'), Yes I tried this but is only giving me one unique state in the column "State" and if I have one state with different county I want to keep the state with the county not just one unique state – coding Aug 26 '20 at 23:05

2 Answers2

0

With your help this was my last line of codes that help me with this issue

df1 = df1.groupby(['State',"County"]).agg('sum')
df1 =df1.reset_index()
df1   

This was my result

| State        | County        |  Homicides
|--------------|---------------|-----------
|   Ags        |  Calvillo     |    4
|  Mexico City |  Alvaro O     |    5
| Mexico City  |  Miguel H     |    2
|   Gto        |   Leon        |    2


coding
  • 917
  • 2
  • 12
  • 25
0

I had a same struggle to do group_by & summation with dfply. Description of "dfply" module for using group_by function in https://github.com/kieferk/dfply/blob/master/README.md is not enough, I think.

Try below.

df1 >> group_by(X.Country) >> summarize(sum_Country = X.County.sum())
johanna
  • 401
  • 4
  • 6