I'm having trouble getting the mean value of a timedelta column.
My data looks like this:
user date Flag Value
0 ron 12/23/2016 'flag' 0 days 10:08:00
1 ron 12/21/2016 'n/a' 0 days 08:00:00
2 ron 12/23/2016 'flag' 0 days 10:08:00
3 ron 12/21/2016 'n/a' 0 days 02:00:00
4 andy 12/22/2016 'flag' 0 days 10:00:00
5 andy 12/22/2016 'flag' 0 days 10:00:00
I would like to generate an Avg column by taking the mean of Value per user based on Flag==flag. So the data would like this:
user date Flag Value Avg
0 ron 12/23/2016 'flag' 0 days 10:08:00 0 days 10:08:00
1 ron 12/21/2016 'n/a' 0 days 08:00:00 0 days 10:08:00
2 ron 12/23/2016 'flag' 0 days 10:08:00 0 days 10:08:00
3 ron 12/21/2016 'n/a' 0 days 02:00:00 0 days 10:08:00
4 andy 12/22/2016 'flag' 0 days 10:00:00 0 days 10:00:00
5 andy 12/22/2016 'flag' 0 days 10:00:00 0 days 10:00:00
I have this code which produces a data error:
sample.loc[:,'Value'] = pd.to_timedelta(sample['Value'])
sample.loc[:,'Avg'] = sample['user'].map(sample[sample['Flag']=='flag'].groupby('user')['Value'].mean())
But this is the error i get:
DataError: No numeric types to aggregate
I'm not sure why it's saying this when I've converted Value to timedelta. Guidance is appreciated.