Lets's assume I count how many oranges (Orange
) and apples (Apple
) people (id
) eat in a certain time period. I also know if they are young or old (group
). The pandas dataframe would maybe look like this:
df = pd.DataFrame({'id': ['1','2','3','7'],
'group': ['Young', 'Young', 'Old', 'Old'],
'Apple': [7,2,5,4],
'Orange': [3,6,4,4] })
id group Apple Orange
0 1 Young 7 3
1 2 Young 2 6
2 3 Old 5 4
3 7 Old 4 4
We can easily compute means using groupby()
. E. g:
df.Apple.groupby(df.group).mean()
outputs
Old 4.5
Young 4.5
But let's say, I want to find how much the amount of apples and oranges consumed differs to the group mean per individual?
That is, the output should be
df = pd.DataFrame({'id': ['1','2','3','7'],
'group': ['Young', 'Young', 'Old', 'Old'],
'Apple': [7,2,5,4],
'Orange': [3,6,4,4],
'Apple Difference': [2.5, -2.5, 0.5, -0.5] })
id group Apple Orange Apple Difference
0 1 Young 7 3 2.5
1 2 Young 2 6 -2.5
2 3 Old 5 4 0.5
3 7 Old 4 4 -0.5
Is there a way to do this with pandas/numpy? Sorry for the rockie-question Best /R