Pandas: groupby first column and apply that to rest of df (store values in list)

Question

I know there are a lot of similar questions already, but none seem to apply to my problem (or I don't know how to apply them)

So I have a Pandas DataFrame with duplicated entries in the first column, but different values in the rest of the columns.

Like so:

     location      year     GDP     ...
0    AUT           1998     14...
1    AUT           2018     98...
2    AUS           1998     24...
3    AUS           2018     83...
...

I would like to get only unique entries in 'location' but keep all the columns and their values in a list:

     location      year             GDP               ...
0    AUT           (1998, 2018)     (14..., 98...)
1    AUS           (1998, 2018)     (24..., 83...)
...

I tried:

grouped = df_gdp_pop_years.groupby("Location")['Year'].apply(list).reset_index(name = "Year")

and I tried to do the same with a lambda function, but I always end up with the same problem: I only keep the years.

How can I get the results I want?

score 2 · Answer 1 · answered Nov 28 '21 at 00:55

2

You can try something like

df_gdp_pop_years.groupby("Location").agg({"Year": list, "GDP": list})

answered Nov 28 '21 at 00:55

rchome

2,623
8
21

score 0 · Answer 2 · answered Nov 28 '21 at 01:34

If your other columns may be changing or there may be more added, you can accomplish this with a generalized .agg() on those columns:

df_gdp_pop_years.groupby('location').agg(lambda x: x.tolist())

I found this by searching for 'opposite of pandas explode' which led me to a different SO question: How to implode(reverse of pandas explode) based on a column

Pandas: groupby first column and apply that to rest of df (store values in list)

2 Answers2