0

I know there are a lot of similar questions already, but none seem to apply to my problem (or I don't know how to apply them)

So I have a Pandas DataFrame with duplicated entries in the first column, but different values in the rest of the columns.

Like so:

     location      year     GDP     ...
0    AUT           1998     14...
1    AUT           2018     98...
2    AUS           1998     24...
3    AUS           2018     83...
...

I would like to get only unique entries in 'location' but keep all the columns and their values in a list:

     location      year             GDP               ...
0    AUT           (1998, 2018)     (14..., 98...)
1    AUS           (1998, 2018)     (24..., 83...)
...

I tried:

grouped = df_gdp_pop_years.groupby("Location")['Year'].apply(list).reset_index(name = "Year")

and I tried to do the same with a lambda function, but I always end up with the same problem: I only keep the years.

How can I get the results I want?

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Jana
  • 49
  • 2
  • 6

2 Answers2

2

You can try something like

df_gdp_pop_years.groupby("Location").agg({"Year": list, "GDP": list})
rchome
  • 2,623
  • 8
  • 21
0

If your other columns may be changing or there may be more added, you can accomplish this with a generalized .agg() on those columns:

df_gdp_pop_years.groupby('location').agg(lambda x: x.tolist())

I found this by searching for 'opposite of pandas explode' which led me to a different SO question: How to implode(reverse of pandas explode) based on a column

scotscotmcc
  • 2,719
  • 1
  • 6
  • 29