Combining rows in pandas

Question

I have a DataFrame with an index called city_id of cities in the format [city],[state] (e.g., new york,ny containing integer counts in the columns. The problem is that I have multiple rows for the same city, and I want to collapse the rows sharing a city_id by adding their column values. I looked at groupby() but it wasn't immediately obvious how to apply it to this problem.

Edit:

An example: I'd like to change this:

city_id    val1 val2 val3
houston,tx    1    2    0
houston,tx    0    0    1
houston,tx    2    1    1

into this:

city_id    val1 val2 val3
houston,tx    3    3    2

if there are ~10-20k rows.

This isn't really a duplicate and @DSM's answer `df.groupby(df.index)` doesn't appear in the duplicated question. — LondonRob, Jun 09 '15 at 17:13

score 54 · Accepted Answer · answered Jul 03 '13 at 04:29

Starting from

>>> df
              val1  val2  val3
city_id                       
houston,tx       1     2     0
houston,tx       0     0     1
houston,tx       2     1     1
somewhere,ew     4     3     7

I might do

>>> df.groupby(df.index).sum()
              val1  val2  val3
city_id                       
houston,tx       3     3     2
somewhere,ew     4     3     7

or

>>> df.reset_index().groupby("city_id").sum()
              val1  val2  val3
city_id                       
houston,tx       3     3     2
somewhere,ew     4     3     7

The first approach passes the index values (in this case, the city_id values) to groupby and tells it to use those as the group keys, and the second resets the index and then selects the city_id column. See this section of the docs for more examples. Note that there are lots of other methods in the DataFrameGroupBy objects, too:

>>> df.groupby(df.index)
<pandas.core.groupby.DataFrameGroupBy object at 0x1045a1790>
>>> df.groupby(df.index).max()
              val1  val2  val3
city_id                       
houston,tx       2     2     1
somewhere,ew     4     3     7
>>> df.groupby(df.index).mean()
              val1  val2      val3
city_id                           
houston,tx       1     1  0.666667
somewhere,ew     4     3  7.000000

Is there anyway to select the row which has the maximum mean in its group? In this example, it should select `houston,tx 2 1 1` from the first group. — Morteza Milani, Sep 26 '17 at 19:56
Is there a method where you might define a column to sum, another to max, another to min so the result would be Huston, TX 3 (sum), 2 (max), 1(min)? — MrKingsley, May 27 '22 at 17:29

score 5 · Answer 2 · answered Jul 03 '13 at 04:32

Something in the same line. Sorry not the exact replica.

mydata = [{'subid' : 'B14-111', 'age': 75, 'fdg':1.78},
          {'subid' : 'B14-112', 'age': 22, 'fdg':1.56},{'subid' : 'B14-112', 'age': 40, 'fdg':2.00},]
df = pandas.DataFrame(mydata)

gg = df.groupby("subid",sort=True).sum()

Combining rows in pandas

2 Answers2

Linked