So i want to calculate standard deviation excluding current group using groupby. Here an example of the data:
import pandas as pd
df = pd.DataFrame ({
'group' : ['A','A','A','A','A','A','B','B','B','B','B','B'],
'team' : ['1','1','2','2','3','3','1','1','2','2','3','3',]
'value' : [1,2,5,7,2,3,7,8,8,9,6,4]
})
For example, for group A team 1, i want to calculate the std dev of team 2 and 3, for group A team 2, i want to calculate the std dev of group 1 and 3, and so on.
I managed to do it using groupby and apply but when using it on real data with literally milion of rows, it takes too long. So i am looking for a solution with vectorization.
def std(row, data):
data = data.loc[data['group']==row['group]]
return data.groupby(['team']).filter(lambda x:(x['tool]!=row['team']).all())['value'].std()
df['std_exclude'] = df.apply(lambda x: std(data=df),axis=1)