3

I am importing the datetime library in my python program and am taking the duration of multiple events. Below is my code for that:

d1 = datetime.datetime.strptime(starttime, '%Y-%m-%d:%H:%M:%S')
d2 = datetime.datetime.strptime(endtime, '%Y-%m-%d:%H:%M:%S')
duration = d2 - d1
print str(duration)

Now I have a value in the variable "duration". The output of this will be:

0:00:15
0:00:15
0:00:15
0:00:15
0:00:15
0:00:05
0:00:05
0:00:05
0:00:05
0:00:05
0:00:10
0:00:10
0:00:10
0:00:10
0:45:22

I want to take the standard deviation of all the durations and determine if there is an anomaly. For example, the 00:45:22 is an anomaly and I want to detect that. I could do this if I knew what format datetime was in, but it doesn't appear to be digits or anything..I was thinking about splitting the values up from : and using all the values in between, but there might be a better way.

Ideas?

Chango Mango
  • 61
  • 4
  • 8

2 Answers2

5

You have datetime.timedelta() objects. These have .microseconds, .seconds and .days attributes, all 3 integers. The str() string representation represents those as [D day[s], ][H]H:MM:SS[.UUUUUU] as needed to fit all values present.

You can use simple arithmetic on these objects. Summing and division work as expected, for example:

>>> (timedelta(seconds=100) + timedelta(seconds=200)) / 2
datetime.timedelta(0, 150)

Unfortunately, you cannot multiply two timedeltas and calculating a standard deviation thus becomes tricky (no squaring of offsets).

Instead, I'd use the .total_seconds() method, to give you a floating point value that is calculated from the days, seconds and microseconds values, then use those values to calculate a standard deviation.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
2

The duration objects you are getting are timedelta objects. Or durations from one timestamp to another. To convert them to a total number of microseconds use:

def timedelta_to_microtime(td):
    return abs(td.microseconds + (td.seconds + td.days * 86400) * 1000000)

Then calculate the standard deviation:

def calc_std(L):
    n = len(L)
    mean = sum(L) / float(n)
    dev = [x - mean for x in L]
    dev2 = [x*x for x in dev]
    return math.sqrt(sum(dev2) / n) 

So:

timedeltas = [your timedeltas here..]
microtimes = [timedelta_to_microtime(td) for td in timedeltas]
std = calc_std(microtimes)
print [(td, mstime) 
       for (td, mstime) in zip(timedeltas, microtimes)
       if mstime - std > X] 
Björn Lindqvist
  • 19,221
  • 20
  • 87
  • 122
  • What is X here? Is it the threshold that i set? I want the threshold to be determined by the std. and this returns : "[(datetime.timedelta(0, 1), 1000000)]" How can I just get a normal number? – Chango Mango Mar 30 '13 at 17:12