Pandas duplicates groupby

Question

I've a Pandas dataframe, and some numerical data about some people. What I need to do is to find people that appare more than one time in the dataframe, and to substitute all the row about one people with one row where the numeric values are the sum of the numeric values of the rows before in some columns, and the minimum of this values in other. I know how to do the sum using groupby() and sum() but not how to do different thing for the different columns

Example:

Names  Column1 Column2 Column3  
John     1        2     2016
Bob      2        3     2011
Pier     1        1     2003
John     3        3     2005
Bob      1        0     2018

Have to become:

Names  Column1 Column2 Column3  
John     4        5     2005
Bob      3        3     2011
Pier     1        1     2003

How can I do?

Use `groupby` + [`agg`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.DataFrameGroupBy.agg.html) function. — Space Impact, Nov 03 '18 at 16:18

score 2 · Accepted Answer · answered Nov 03 '18 at 16:27

Use groupby + agg and define specific aggregation functions for each column as a dict like:

df.groupby('Names').agg({'Column1':'sum', 'Column2':'sum','Column3':'min'})

    Column1 Column2 Column3
Names           
Bob     3     3     2011
John    3     3     2005
Jonh    1     2     2016
Pier    1     1     2003

Pandas duplicates groupby

1 Answers1

Linked