percentile for datetime column python

Question

Is there a way to compute the percentile for a dataframe column with datetime format while still retaining the datetime format (Y-m-d H:M:S) and not converted to seconds for the percentile value? example of the data with datetime format

df: 
0   2016-07-31 08:00:00
1   2016-07-30 14:30:00
2   2006-06-24 14:15:00
3   2016-07-15 08:15:45
4   2016-08-01 23:50:00

score 3 · Accepted Answer · answered Jul 11 '18 at 11:01

There is a built-in function quantile that can be used for that. Let

df = pd.Series(['2016-07-31 08:00:00', '2016-07-30 14:30:00', '2006-06-24 14:15:00', '2016-07-15 08:15:45', '2016-08-01 23:50:00'])
df
0   2016-07-31 08:00:00
1   2016-07-30 14:30:00
2   2006-06-24 14:15:00
3   2016-07-15 08:15:45
4   2016-08-01 23:50:00

then

>>> df.quantile(0.5)
Timestamp('2016-07-30 14:30:00')

See also the official documentation

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.quantile.html

Vikash Singh · Answer 2 · 2017-08-17T16:05:20.367

describe() method on datetime column doesn't work the same way as it does on integer columns or float columns

So we can create our custom method to do the same:

import pandas as pd
from datetime import timedelta
from datetime import datetime

base = datetime.now()
date_list = [base - timedelta(days=x) for x in range(0, 20)]    
df = pd.DataFrame.from_dict({'Date': date_list})

df

                          Date
0   2017-08-17 21:32:54.044948
1   2017-08-16 21:32:54.044948
2   2017-08-15 21:32:54.044948
3   2017-08-14 21:32:54.044948

def describe_datetime(dataframe, column, percentiles=[i/10 for i in range(1,11)]):
    new_date = dataframe[column].dt.strftime('%Y-%m-%d').sort_values().values
    length = len(new_date)
    for percentile in percentiles:
        print(percentile, ':', new_date[int(percentile * length)-1])

describe_datetime(df, 'Date')

output:

0.1 : 2017-07-30
0.2 : 2017-08-01
0.3 : 2017-08-03
0.4 : 2017-08-05
0.5 : 2017-08-07
0.6 : 2017-08-09
0.7 : 2017-08-11
0.8 : 2017-08-13
0.9 : 2017-08-15
1.0 : 2017-08-17

this approach does not work when I tried it. I also tried varying the percentile and values returned are the same. — T-Jay, Aug 17 '17 at 15:13
@T-Jay I have created a custom method to do what you were looking for. — Vikash Singh, Aug 17 '17 at 16:05

score 0 · Answer 3 · answered Aug 17 '17 at 15:20

After trying some code. I was a able to compute the percentile using the code below, I sorted the column and used its index to compute the percentile. dataframe is 'df', column with datetime format is 'dates'

date_column = list(df.sort_values('dates')['dates'])
index = range(0,len(date_column)+1)
date_column[np.int((np.percentile(index, 50)))]

percentile for datetime column python

3 Answers3