take standard deviation of datetime in python

Question

I am importing the datetime library in my python program and am taking the duration of multiple events. Below is my code for that:

d1 = datetime.datetime.strptime(starttime, '%Y-%m-%d:%H:%M:%S')
d2 = datetime.datetime.strptime(endtime, '%Y-%m-%d:%H:%M:%S')
duration = d2 - d1
print str(duration)

Now I have a value in the variable "duration". The output of this will be:

0:00:15
0:00:15
0:00:15
0:00:15
0:00:15
0:00:05
0:00:05
0:00:05
0:00:05
0:00:05
0:00:10
0:00:10
0:00:10
0:00:10
0:45:22

I want to take the standard deviation of all the durations and determine if there is an anomaly. For example, the 00:45:22 is an anomaly and I want to detect that. I could do this if I knew what format datetime was in, but it doesn't appear to be digits or anything..I was thinking about splitting the values up from : and using all the values in between, but there might be a better way.

Ideas?

You are looking at the *string representation* of a `timedelta` object. — Martijn Pieters, Mar 30 '13 at 15:02
Did you try looking through the [datetime documentation](http://docs.python.org/2/library/datetime.html)? — Xymostech, Mar 30 '13 at 15:03
Avoid writing `datetime.datetime` by using `from datetime import datetime`. — Adam, Mar 30 '13 at 15:16

Martijn Pieters · Answer 1 · 2013-03-30T15:11:25.080

You have datetime.timedelta() objects. These have .microseconds, .seconds and .days attributes, all 3 integers. The str() string representation represents those as [D day[s], ][H]H:MM:SS[.UUUUUU] as needed to fit all values present.

You can use simple arithmetic on these objects. Summing and division work as expected, for example:

>>> (timedelta(seconds=100) + timedelta(seconds=200)) / 2
datetime.timedelta(0, 150)

Unfortunately, you cannot multiply two timedeltas and calculating a standard deviation thus becomes tricky (no squaring of offsets).

Instead, I'd use the .total_seconds() method, to give you a floating point value that is calculated from the days, seconds and microseconds values, then use those values to calculate a standard deviation.

for instance, by using `numpy.std(total_seconds)` – PascalVKooten Mar 30 '13 at 15:12 — PascalVKooten, Mar 30 '13 at 15:12

score 2 · Answer 2 · answered Mar 30 '13 at 15:14

The duration objects you are getting are timedelta objects. Or durations from one timestamp to another. To convert them to a total number of microseconds use:

def timedelta_to_microtime(td):
    return abs(td.microseconds + (td.seconds + td.days * 86400) * 1000000)

Then calculate the standard deviation:

def calc_std(L):
    n = len(L)
    mean = sum(L) / float(n)
    dev = [x - mean for x in L]
    dev2 = [x*x for x in dev]
    return math.sqrt(sum(dev2) / n)

So:

timedeltas = [your timedeltas here..]
microtimes = [timedelta_to_microtime(td) for td in timedeltas]
std = calc_std(microtimes)
print [(td, mstime) 
       for (td, mstime) in zip(timedeltas, microtimes)
       if mstime - std > X]

What is X here? Is it the threshold that i set? I want the threshold to be determined by the std. and this returns : "[(datetime.timedelta(0, 1), 1000000)]" How can I just get a normal number? — Chango Mango, Mar 30 '13 at 17:12

take standard deviation of datetime in python

2 Answers2