I have a DataFrame containing 'test results' for different people over multiple tests per person.
It has the columns name
, age
, score
scores = pd.DataFrame({'name': ['Alex', 'Alex', 'Alex', 'Alex', 'Alex', 'James', 'James', 'James', 'James', 'James', 'James', 'Emily', 'Emily', 'Emily', 'Emily', 'Emily', 'Emily', 'Emily'], 'age': [25, 26, 26, 27, 27, 25, 26, 26, 26, 27, 27, 25, 25, 26, 26, 26, 27, 27], 'score': [10, 0, 2, 1, 2, 2, 4, 6, 6, 10, 8, 4, 7, 6, 10, 9, 7, 10]})
name age score
0 Alex 25 10
1 Alex 26 0
2 Alex 26 2
3 Alex 27 1
4 Alex 27 2
5 James 25 2
6 James 26 4
7 James 26 6
8 James 26 6
9 James 27 10
10 James 27 8
11 Emily 25 4
12 Emily 25 7
13 Emily 26 6
14 Emily 26 10
15 Emily 26 9
16 Emily 27 7
17 Emily 27 10
I have performed a group by on name
and age
and aggregating to give the max_score
for each group (which is the max
value of the score
column for an individual in each year)
age_scores = scores.groupby(['name','age']).agg({"score":'max'})
The result looks like
score
name age
Alex 25 10
26 2
27 2
Emily 25 7
26 10
27 10
James 25 2
26 6
27 10
I would like to have a dataframe with one row per person and then a column for the max score at each age
i.e
name max_25 max_26 max_27
0 Alex 10 2 2
1 James 7 10 10
2 Emily 2 6 10