Get values on a big DataFrame Python

Question

I have a big dataframe with a structure like this:

ID  Year Consumption
1   2012      24
2   2012      20
3   2012      21
1   2013      22
2   2013      23
3   2013      24
4   2013      25

I want another DataFrame that contains first year of appearence, and max consumption of all time per ID like this:

ID   First_Year  Max_Consumption
1       2012          24
2       2012          23
3       2012          24
4       2013          25

Is there a way to extract this data without using loops? I have tried this:

year = list(set(df.Year))
ids = list(set(df.ID))

antiq = list()
max_con = list()

for i in ids:
    df_id = df[df['ID'] == i]
    antiq.append(min(df_id['Year']))
    max_con.append(max(df_id['Consumption']))

But it's too slow. Thank you!

score 2 · Accepted Answer · answered Oct 25 '18 at 16:25

2

Use GroupBy + agg:

res = df.groupby('ID', as_index=False).agg({'Year': 'min', 'Consumption': 'max'})

print(res)

   ID  Year  Consumption
0   1  2012           24
1   2  2012           23
2   3  2012           24
3   4  2013           25

answered Oct 25 '18 at 16:25

jpp

159,742
34
281
339

score 1 · Answer 2 · answered Oct 25 '18 at 16:29

1

Another alternative to groupby is pivot_table:

pd.pivot_table(df, index="ID", aggfunc={"Year":min, "Consumption":max})

answered Oct 25 '18 at 16:29

Toby Petty

4,431
1
17
29

Get values on a big DataFrame Python

2 Answers2