0

I have a big dataframe with a structure like this:

ID  Year Consumption
1   2012      24
2   2012      20
3   2012      21
1   2013      22
2   2013      23
3   2013      24
4   2013      25

I want another DataFrame that contains first year of appearence, and max consumption of all time per ID like this:

ID   First_Year  Max_Consumption
1       2012          24
2       2012          23
3       2012          24
4       2013          25

Is there a way to extract this data without using loops? I have tried this:

year = list(set(df.Year))
ids = list(set(df.ID))

antiq = list()
max_con = list()

for i in ids:
    df_id = df[df['ID'] == i]
    antiq.append(min(df_id['Year']))
    max_con.append(max(df_id['Consumption']))

But it's too slow. Thank you!

jpp
  • 159,742
  • 34
  • 281
  • 339
Alejandra
  • 13
  • 1

2 Answers2

2

Use GroupBy + agg:

res = df.groupby('ID', as_index=False).agg({'Year': 'min', 'Consumption': 'max'})

print(res)

   ID  Year  Consumption
0   1  2012           24
1   2  2012           23
2   3  2012           24
3   4  2013           25
jpp
  • 159,742
  • 34
  • 281
  • 339
1

Another alternative to groupby is pivot_table:

pd.pivot_table(df, index="ID", aggfunc={"Year":min, "Consumption":max})
Toby Petty
  • 4,431
  • 1
  • 17
  • 29