2

I have a list of strings (unicode). Like so:

>>> tstamp
[u'2017-08-08T08:51:20.465Z', u'2017-08-08T08:51:27.871Z', u'2017-08-08T08:51:33.399Z', u'2017-08-08T08:51:37.530Z', u'2017-08-08T08:51:47.248Z', u'2017-08-08T08:51:50.414Z', u'2017-08-08T08:51:54.707Z', u'2017-08-08T08:51:54.781Z']

I want to convert this list of strings to a list of datetime objects. Like so:

>>> dtstamp
[datetime.datetime(2017, 8, 8, 8, 51, 20, 465000), datetime.datetime(2017, 8, 8, 8, 51, 27, 871000), datetime.datetime(2017, 8, 8, 8, 51, 33, 399000), datetime.datetime(2017, 8, 8, 8, 51, 37, 530000), datetime.datetime(2017, 8, 8, 8, 51, 47, 248000), datetime.datetime(2017, 8, 8, 8, 51, 50, 414000), datetime.datetime(2017, 8, 8, 8, 51, 54, 707000), datetime.datetime(2017, 8, 8, 8, 51, 54, 781000)]

The solution I have is very crude, I am looking to do this conversion without having to use any kind of loop. The speed of conversion is crucial. Here is my code so far:

dtstamp = [0]*len(tstamp)
for i in range(0,len(tstamp)):
     dtstamp[i] = datetime.datetime.strptime(tstamp[i], '%Y-%m-%dT%H:%M:%S.%fZ')

It does what I want to do but will be slow. I thought about trying this, but doesn't work:

dtstamp = datetime.datetime.strptime(tstamp, '%Y-%m-%dT%H:%M:%S.%fZ')

Anyone can point me in the right direction?

Thanks in advance!

Tanmay
  • 265
  • 4
  • 13
  • 5
    How does one run through items in a list without using a loop? Even the crispy `map`, will have to *loop* behind the scenes. – Moses Koledoye Aug 08 '17 at 11:50
  • How long is the list? There might be a point where converting it to a Pandas Series is worthwhile. – roganjosh Aug 08 '17 at 11:50
  • 1
    how do you feel about list comprehensions? – Stael Aug 08 '17 at 11:51
  • True and it would require conversion to a datatype that wouldn't require looping. I was thinking of pandas or something else? The list is about 600k elements long and needs to get updated every second. – Tanmay Aug 08 '17 at 11:52
  • @Stael it's still a loop, the only way not to do a loop is to convert like in `thing[0], thing[1]` etc.. which is just silly – user3012759 Aug 08 '17 at 11:53
  • Well, there would be overhead in making the structure in the first place and datetime is quite slow, but I'm not sure that pandas is much faster and could ever overcome the overhead. It's something to test I guess. But your options are limited. – roganjosh Aug 08 '17 at 11:53
  • Thank you for your answers guys. I timed all three options in the answers below, and found list comprehension to be the fastest. Mapping is very close as well. Parser method is good to know but is prohibitively slower. – Tanmay Aug 08 '17 at 12:06

4 Answers4

7

You can get a significant speedup simply by using pd.to_datetime on the list as it is. However, I don't think you're going to get to 600,000 conversions every second from this even if you can tweak the approach.

import pandas as pd
import datetime as dt

my_list = [u'2017-08-08T08:51:20.465Z', u'2017-08-08T08:51:27.871Z', u'2017-08-08T08:51:33.399Z', u'2017-08-08T08:51:37.530Z', u'2017-08-08T08:51:47.248Z', u'2017-08-08T08:51:50.414Z', u'2017-08-08T08:51:54.707Z', u'2017-08-08T08:51:54.781Z']
new_list = []
for x in xrange(100000):
    new_list.extend(my_list)

def basic_list_approach(the_list):
    return [dt.datetime.strptime(item, '%Y-%m-%dT%H:%M:%S.%fZ') for item in the_list]

def pandas_approach(the_list):
    converted = pd.to_datetime(the_list)
    return converted

%timeit basic_list_approach(new_list)
1 loop, best of 3: 12.6 s per loop

%timeit pandas_approach(new_list)
1 loop, best of 3: 1.45 s per loop
roganjosh
  • 12,594
  • 4
  • 29
  • 46
  • This is by far the quickest option I have. Thank you. – Tanmay Aug 08 '17 at 12:16
  • 1
    No problem. You might consider putting this into an array further upstream in your process as a way to make things faster. However, I don't know whether you can afford that overhead elsewhere, or whether you will still reliably get to sub second processing. This is 800K in 1.45s (though my timing is a single loop) so... maybe. – roganjosh Aug 08 '17 at 12:18
3

You can't iterate through the items without a loop. For a one line solution you can use this:

import dateutil.parser
print [dateutil.parser.parse(i) for i in tstamp]
Akshay Apte
  • 1,539
  • 9
  • 24
3

Have you tried list comprehension?

[datetime.datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ')for x in tstamp]
# [datetime.datetime(2017, 8, 8, 8, 51, 20, 465000), datetime.datetime(2017, 8, 8, 8, 51, 27, 871000), datetime.datetime(2017, 8, 8, 8, 51, 33, 399000), datetime.datetime(2017, 8, 8, 8, 51, 37, 530000), datetime.datetime(2017, 8, 8, 8, 51, 47, 248000), datetime.datetime(2017, 8, 8, 8, 51, 50, 414000), datetime.datetime(2017, 8, 8, 8, 51, 54, 707000), datetime.datetime(2017, 8, 8, 8, 51, 54, 781000)]

it still uses a loop in the background but it's rather optimized.

Regards, Koen

Koen
  • 395
  • 3
  • 13
2

If you really want to omit the loop (in your code), you can use map():

map(lambda item: datetime.datetime.strptime(item, '%Y-%m-%dT%H:%M:%S.%fZ'), 
    tstamp)

Be aware though, that even map() will eventually use a loop to do it. There is no way to do it without iterating over every item in the list. However smart the code would be, there will always be a loop somewhere behind the scenes.

If you really need it to be ultra-fast, then the only way to do it with python is by using C extensions.

Błażej Michalik
  • 4,474
  • 40
  • 55