I have the following data frame:
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
'C' : np.random.randn(8),
'D' : np.random.randn(8)})
A B C D
0 foo one 0.478183 -1.267588
1 bar one 0.555985 -2.143590
2 foo two -1.592865 1.251546
3 bar three 0.174138 -0.708198
4 foo two 0.302215 -0.219041
5 bar two -0.034550 -0.965414
6 foo one 1.310828 -0.388601
7 foo three 0.357659 -1.610443
I'm trying to add another column which will be a normalized version of column C over partition by A:
normed = df.groupby('A').apply(lambda x: (x['C']-min(x['C']))/(max(x['C'])-min(x['C'])))
A
bar 1 0.000000
3 0.033396
5 1.000000
foo 0 1.000000
2 0.413716
4 0.000000
6 0.441061
7 0.357787
Finally I want to join this result back to df (using advice from the similar question):
df.join(normed, on='A', rsuffix='_normed')
However, I get an error:
ValueError: len(left_on) must equal the number of levels in the index of "right"
How can I add normed
result back to dataframe df
?