29

I would like to calculate the mean and standard deviation of a timedelta by bank from a dataframe with two columns shown below. When I run the code (also shown below) I get the below error:

pandas.core.base.DataError: No numeric types to aggregate

My dataframe:

   bank                          diff
   Bank of Japan                 0 days 00:00:57.416000
   Reserve Bank of Australia     0 days 00:00:21.452000
   Reserve Bank of New Zealand  55 days 12:39:32.269000
   U.S. Federal Reserve          8 days 13:27:11.387000

My code:

means = dropped.groupby('bank').mean()
std = dropped.groupby('bank').std()
akshat
  • 1,219
  • 1
  • 8
  • 24
Graham Streich
  • 874
  • 3
  • 15
  • 31
  • How do you want to aggregate the `timedelta` object? Access the `.days` or `.seconds` attributes if you're looking to aggregate. – Abdou Jun 18 '17 at 15:30

4 Answers4

31

Pandas mean() and other aggregation methods support numeric_only=False parameter.

dropped.groupby('bank').mean(numeric_only=False)

Found here: Aggregations for Timedelta values in the Python DataFrame

25

You need to convert timedelta to some numeric value, e.g. int64 by values what is most accurate, because convert to ns is what is the numeric representation of timedelta:

dropped['new'] = dropped['diff'].values.astype(np.int64)

means = dropped.groupby('bank').mean()
means['new'] = pd.to_timedelta(means['new'])

std = dropped.groupby('bank').std()
std['new'] = pd.to_timedelta(std['new'])

Another solution is to convert values to seconds by total_seconds, but that is less accurate:

dropped['new'] = dropped['diff'].dt.total_seconds()

means = dropped.groupby('bank').mean()
petezurich
  • 9,280
  • 9
  • 43
  • 57
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
11

No need to convert timedelta back and forth. Numpy and pandas can seamlessly do it for you with a faster run time. Using your dropped DataFrame:

import numpy as np

grouped = dropped.groupby('bank')['diff']

mean = grouped.apply(lambda x: np.mean(x))
std = grouped.apply(lambda x: np.std(x))
Sam
  • 11,799
  • 9
  • 49
  • 68
1

I would suggest passing the numeric_only=False argument to mean as mentioned by Alexander Usikov - this works for pandas version 0.20+.

If you have an older version, the following works:

import pandas pd

df = pd.DataFrame({
    'td': pd.Series([pd.Timedelta(days=i) for i in range(5)]),
    'group': ['a', 'a', 'a', 'b', 'b']
})

(
    df
    .astype({'td': int})         # convert timedelta to integer (nanoseconds)
    .groupby('group')
    .mean()
    .astype({'td': 'timedelta64[ns]'})
)
Cor
  • 31
  • 2