Averaging duplicates in a pandas DataFrame instead of using drop_duplicates to keep first

Question

Assume that I have a Pandas DataFrame of the form:

    id      price       dur
1   153     80.0        0.0 
2   153     130.0       0.0 
3   153     95.0        0.0 
4   156     115.0       0.0
5   156     165.0       0.0
6   156     130.0       0.0
7   158     90.0        0.0
8   158     140.0       0.0 
9   158     105.0       0.0
10  158     155.0       0.0

And I have a column named id and it has duplicates. I would like to handle this duplicates by keeping a unique id and then performing an average of the price of the id instead of using pd.DataFrame.drop_duplicates()

Here is my expected output:

    id      price       dur
1   153     101.667     0.0 
2   156     136.667     0.0
3   158     122.5       0.0

How could I possibly handle this?

score 1 · Answer 1 · answered Apr 14 '20 at 10:26

For each column is necessary specify aggregate function in GroupBy.agg:

df1 = df.groupby('id', as_index=False).agg({'price':'mean', 'dur':'first'})
print (df1)
    id       price  dur
0  153  101.666667  0.0
1  156  136.666667  0.0
2  158  122.500000  0.0

But if same values in dur for each id is possible grouping by both columns:

df2 = df.groupby(['id', 'dur'], as_index=False)['price'].mean()

Averaging duplicates in a pandas DataFrame instead of using drop_duplicates to keep first

1 Answers1