16

Looking for fastest solution of time averaging problem.

I've got a list of datetime objects. Need to find average value of time (excluding year, month, day). Here is what I got so far:

import datetime as dtm
def avg_time(times):
    avg = 0
    for elem in times:
        avg += elem.second + 60*elem.minute + 3600*elem.hour
    avg /= len(times)
    rez = str(avg/3600) + ' ' + str((avg%3600)/60) + ' ' + str(avg%60)
    return dtm.datetime.strptime(rez, "%H %M %S")
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
user2915556
  • 199
  • 1
  • 2
  • 8
  • 1
    What is your question? Is it not fast enough for your purpose? How much faster would it have to be then? What's the context (i.e., there may be a different approach that is faster and bypasses this routine)? –  Oct 30 '13 at 12:08
  • My question is how to improve the overall speed. As much faster as it can be on Python. Maybe there is some function or alternative way to do the same. Important note: originally data for averaging is coming from pandas DataFrame column (datetime64[ns] type) – user2915556 Oct 30 '13 at 13:37

6 Answers6

8

Here's a short and sweet solution (perhaps not the fastest though). It takes the difference between each date in the date list and some arbitrary reference date (returning a datetime.timedelta), and then sums these differences and averages them. Then it adds back in the original reference date.

import datetime
def avg(dates):
  any_reference_date = datetime.datetime(1900, 1, 1)
  return any_reference_date + sum([date - any_reference_date for date in dates], datetime.timedelta()) / len(dates)
Ben
  • 3
  • 3
wiesiu_p
  • 558
  • 7
  • 6
6

Here's a better way to approach this problem

Generate a sample of datetimes

In [28]: i = date_range('20130101',periods=20000000,freq='s')

In [29]: i
Out[29]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00, ..., 2013-08-20 11:33:19]
Length: 20000000, Freq: S, Timezone: None

avg 20m times

In [30]: %timeit pd.to_timedelta(int((i.hour*3600+i.minute*60+i.second).mean()),unit='s')
1 loops, best of 3: 2.87 s per loop

The result as a timedelta (note that this requires numpy 1.7 and pandas 0.13 for the to_timedelta part, coming very soon)

In [31]: pd.to_timedelta(int((i.hour*3600+i.minute*60+i.second).mean()),unit='s')
Out[31]: 
0   11:59:12
dtype: timedelta64[ns]

In seconds (this will work for pandas 0.12, numpy >= 1.6).

In [32]: int((i.hour*3600+i.minute*60+i.second).mean())
Out[32]: 43152
Jeff
  • 125,376
  • 21
  • 220
  • 187
3

I was looking for the same, but then i discovered this. A very simple way to get average of datetime object's list.

    import datetime
    #from datetime.datetime import timestamp,fromtimestamp,strftime ----> You can use this as well to remove unnecessary datetime.datetime prefix :)  
    def easyAverage(datetimeList): ----> Func Declaration
        sumOfTime=sum(map(datetime.datetime.timestamp,datetimeList))
        '''
         timestamp function changes the datetime object to a unix timestamp sort of a format.
         So I have used here a map to just change all the datetime object into a unix time stamp form , added them using sum and store them into sum variable.
        '''
        length=len(datetimeList) #----> Self Explanatory

        averageTimeInTimeStampFormat=datetime.datetime.fromtimestamp(sumOfTime/length)
        '''
        fromtimestamp function returns a datetime object from a unix timestamp.
        '''

        timeInHumanReadableForm=datetime.datetime.strftime(averageTimeInTimeStampFormat,"%H:%M:%S") #----> strftime to change the datetime object to string.
        return timeInHumanReadableForm

Or you can do all this in one simple line:

    avgTime=datetime.datetime.strftime(datetime.datetime.fromtimestamp(sum(map(datetime.datetime.timestamp,datetimeList))/len(datetimeList)),"%H:%M:%S")

Cheers,

Shubham Namdeo
  • 1,845
  • 2
  • 24
  • 40
Rishikesh Jha
  • 540
  • 3
  • 11
  • In python 2.x, replace "map(datetime.datetime.timestamp,datetimeList)" with "[time.mktime(t.timetuple()) + t.microsecond / 1e6 for t in datetimeList]" after importing the time module – hm8 Mar 29 '18 at 19:46
2

You would at least use sum() with a generator expression to create the total number of seconds:

from datetime import datetime, date, time

def avg_time(datetimes):
    total = sum(dt.hour * 3600 + dt.minute * 60 + dt.second for dt in datetimes)
    avg = total / len(datetimes)
    minutes, seconds = divmod(int(avg), 60)
    hours, minutes = divmod(minutes, 60)
    return datetime.combine(date(1900, 1, 1), time(hours, minutes, seconds))

Demo:

>>> from datetime import datetime, date, time, timedelta
>>> def avg_time(datetimes):
...     total = sum(dt.hour * 3600 + dt.minute * 60 + dt.second for dt in datetimes)
...     avg = total / len(datetimes)
...     minutes, seconds = divmod(int(avg), 60)
...     hours, minutes = divmod(minutes, 60)
...     return datetime.combine(date(1900, 1, 1), time(hours, minutes, seconds))
... 
>>> avg_time([datetime.now(), datetime.now() - timedelta(hours=12)])
datetime.datetime(1900, 1, 1, 7, 13)
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I'm not sure I can get along without timedata. It's one of the columns in my pandas DataFrame, which I need to deal with. Could you be a little more specific about using sum() generator loop? – user2915556 Oct 30 '13 at 13:16
  • @user2915556: there may *well* be a better way doing this in pandas; I have no idea if there is as I don't have experience with pandas. Perhaps that could have been stated in your question (including a description of what your dataframes look like). I've taken the liberty of adding a `pandas` tag to your question. I've updated my answer to avoiding using `timedelta` objects. – Martijn Pieters Oct 30 '13 at 13:19
  • Thanks a lot! When running with pandas data (avg_time(df['Date'])) it runs in 24.3 sec (vs. 24.1 of initial version). But when I tried to convert dates in list (df['Data'].tolist() which took 27.3 sec) it goes 4.12 vs 4.26 – user2915556 Oct 30 '13 at 13:31
  • `dt.hour * 3600` - seriously? If you want to convert it to a number, just convert it to unix time! – Navin Jun 25 '17 at 06:59
  • 1
    @Navin Using `.timestamp()` gives you seconds since the epoch, not since midnight, so you wouldn't be calculating an average time of day. You'd have to create an offset for midnight of the current date then too, which makes it all a lot slower again (I tried in an earlier revision). As I said in my answer: using `datetime` objects is the wrong approach to begin with. So yes, *seriously*. – Martijn Pieters Jun 25 '17 at 07:41
  • The code worked for me (in a Jupyter Notebook cell) but only when I added timedelta to the import list: `from datetime import datetime, date, time, timedelta` – burkesquires Aug 22 '17 at 20:46
  • 1
    @burkesquires I only used `timedelta` in the demo to generate an input value. The solution itself doesn't require it. – Martijn Pieters Aug 22 '17 at 21:13
  • Understood but as is the demo does not work. I think for many beginners seeing and copying the demo is more helpful as it shows that there is an addition step, namely that the function must be called. I will change it so that the timedelta is defined below only where the demo needs it. – burkesquires Aug 22 '17 at 23:49
  • 1
    @burkesquires: it works just fine with the updated import line. – Martijn Pieters Aug 23 '17 at 06:57
  • 1
    @burkesquires: and **of course** the function must be called; it depends on your exact setup how you call it. You'd not normally pass in manually generated dates (where the timedelta is only used to generate a second date). – Martijn Pieters Aug 23 '17 at 06:59
1

This isn't the best solution, but might be of help:

import datetime as dt

t1 = dt.datetime(2020,12,31,10,00,5)
t2 = dt.datetime(2021,1,1,17,20,15)

delta = t2-t1 #delta is a datetime.timedelta object and can be used in the + operation

avg = t1 + delta/2 #average of t1 and t2
Danny_DD
  • 756
  • 1
  • 12
  • 34
Vishnu
  • 21
  • 6
1

If you have a list of datetimes:

import pandas as pd
avg=pd.to_datetime(pd.Series(yourdatetimelist)).mean()

If you have a list of timedeltas:

import pandas as pd
avg=pd.to_timedelta(pd.Series(yourtimedeltalist)).mean()
Franciska
  • 456
  • 4
  • 8
  • this is by far the best solution out of all answers given. both in terms of simplicity and performance. should be much higher ranked. I had to do it for dates, not datetimes, and I could simply do it by adding a date(): avg=pd.to_timedelta(pd.Series(yourDateList)).mean().date() – Werner Trelawney Jun 07 '23 at 11:23