Pandas - Group by column and then create new columns from result

Question

I have a DataFrame containing 'test results' for different people over multiple tests per person. It has the columns name, age, score

scores = pd.DataFrame({'name': ['Alex', 'Alex', 'Alex', 'Alex', 'Alex', 'James', 'James', 'James', 'James', 'James', 'James', 'Emily', 'Emily', 'Emily', 'Emily', 'Emily', 'Emily', 'Emily'], 'age': [25, 26, 26, 27, 27, 25, 26, 26, 26, 27, 27, 25, 25, 26, 26, 26, 27, 27], 'score': [10, 0, 2, 1, 2, 2, 4, 6, 6, 10, 8, 4, 7, 6, 10, 9, 7, 10]})

     name  age  score
0    Alex   25     10
1    Alex   26      0
2    Alex   26      2
3    Alex   27      1
4    Alex   27      2
5   James   25      2
6   James   26      4
7   James   26      6
8   James   26      6
9   James   27     10
10  James   27      8
11  Emily   25      4
12  Emily   25      7
13  Emily   26      6
14  Emily   26     10
15  Emily   26      9
16  Emily   27      7
17  Emily   27     10

I have performed a group by on name and age and aggregating to give the max_score for each group (which is the max value of the score column for an individual in each year)

age_scores = scores.groupby(['name','age']).agg({"score":'max'})

The result looks like

           score
name  age       
Alex  25      10
      26       2
      27       2
Emily 25       7
      26      10
      27      10
James 25       2
      26       6
      27      10

I would like to have a dataframe with one row per person and then a column for the max score at each age

i.e

    name  max_25  max_26  max_27
0   Alex      10       2       2
1  James       7      10      10
2  Emily       2       6      10

Does this answer your question? [Pivot and rename Pandas dataframe](https://stackoverflow.com/questions/54419708/pivot-and-rename-pandas-dataframe) — sushanth, Sep 01 '20 at 10:14
Use `unstack`: `df.groupby(["name", "age"])["score"].max().unstack()` — Chris, Sep 01 '20 at 10:14

perl · Accepted Answer · 2020-09-01T13:56:19.750

2

Using pivot, if you want to transform age_scores:

(age_scores
     .reset_index()
     .pivot('name', 'age', 'score')
     .add_prefix('max_')
     .reset_index()
     .rename_axis(None, axis=1))

Output:

    name  max_25  max_26  max_27
0   Alex      10       2       2
1  Emily       7      10      10
2  James       2       6      10

Otherwise, if you don't need age_scores as an intermediate dataframe, solution with unstack proposed by Chris in comments is probably an easier one:

(scores
 .groupby(['name', 'age'])['score'].max()
 .unstack('age')
 .add_prefix('max_')
 .reset_index())

Output:

age   name  max_25  max_26  max_27
0     Alex      10       2       2
1    Emily       7      10      10
2    James       2       6      10

edited Sep 01 '20 at 13:56

answered Sep 01 '20 at 10:14

perl

9,826
1
10
22

1

Thank you! This was very helpful and did exactly what I needed :) – Charlie Morton Sep 01 '20 at 12:53
strangely though the reset_index makes `age` the index and I can't seem to get rid of that – Charlie Morton Sep 01 '20 at 13:29
1

Yes, so that `age` is the name of columns. You can reset it with `df.columns.name = None`, or with `rename_axis` if you want to chain it. I've updated the solution (the `pivot` part of it) to reset that name to None. And if you want more details, here's this [thread on StackOverflow](https://stackoverflow.com/questions/18022845/pandas-index-column-title-or-name?noredirect=1&lq=1) – perl Sep 01 '20 at 13:57

Pandas - Group by column and then create new columns from result

1 Answers1