Finding the mean and standard deviation of a timedelta object in pandas df

Question

I would like to calculate the mean and standard deviation of a timedelta by bank from a dataframe with two columns shown below. When I run the code (also shown below) I get the below error:

pandas.core.base.DataError: No numeric types to aggregate

My dataframe:

   bank                          diff
   Bank of Japan                 0 days 00:00:57.416000
   Reserve Bank of Australia     0 days 00:00:21.452000
   Reserve Bank of New Zealand  55 days 12:39:32.269000
   U.S. Federal Reserve          8 days 13:27:11.387000

My code:

means = dropped.groupby('bank').mean()
std = dropped.groupby('bank').std()

How do you want to aggregate the `timedelta` object? Access the `.days` or `.seconds` attributes if you're looking to aggregate. — Abdou, Jun 18 '17 at 15:30

score 31 · Answer 1 · answered Mar 10 '19 at 15:33

31

Pandas mean() and other aggregation methods support numeric_only=False parameter.

dropped.groupby('bank').mean(numeric_only=False)

Found here: Aggregations for Timedelta values in the Python DataFrame

answered Mar 10 '19 at 15:33

Alexander Usikov

459
4
5

1

This is way more elegant, and IMO should be the accepted answer! – gokul_uf Dec 11 '20 at 02:47

score 25 · Accepted Answer · edited Jul 12 '17 at 19:49

25

You need to convert timedelta to some numeric value, e.g. int64 by values what is most accurate, because convert to ns is what is the numeric representation of timedelta:

dropped['new'] = dropped['diff'].values.astype(np.int64)

means = dropped.groupby('bank').mean()
means['new'] = pd.to_timedelta(means['new'])

std = dropped.groupby('bank').std()
std['new'] = pd.to_timedelta(std['new'])

Another solution is to convert values to seconds by total_seconds, but that is less accurate:

dropped['new'] = dropped['diff'].dt.total_seconds()

means = dropped.groupby('bank').mean()

edited Jul 12 '17 at 19:49

petezurich

9,280
9
43
57

answered Jun 18 '17 at 15:29

jezrael

822,522
95
1,334
1,252

Thank you, this worked like a charm -- (I used the first solution)! – Graham Streich Jun 18 '17 at 22:25
Can you tell why the second solution is less accurate? – vine_J Jan 23 '22 at 00:55

score 11 · Answer 3 · answered Aug 10 '18 at 21:12

No need to convert timedelta back and forth. Numpy and pandas can seamlessly do it for you with a faster run time. Using your dropped DataFrame:

import numpy as np

grouped = dropped.groupby('bank')['diff']

mean = grouped.apply(lambda x: np.mean(x))
std = grouped.apply(lambda x: np.std(x))

score 1 · Answer 4 · answered Jul 11 '19 at 14:33

I would suggest passing the numeric_only=False argument to mean as mentioned by Alexander Usikov - this works for pandas version 0.20+.

If you have an older version, the following works:

import pandas pd

df = pd.DataFrame({
    'td': pd.Series([pd.Timedelta(days=i) for i in range(5)]),
    'group': ['a', 'a', 'a', 'b', 'b']
})

(
    df
    .astype({'td': int})         # convert timedelta to integer (nanoseconds)
    .groupby('group')
    .mean()
    .astype({'td': 'timedelta64[ns]'})
)

Finding the mean and standard deviation of a timedelta object in pandas df

4 Answers4

Linked