4

I am using the following piece of code to extract a date from a string:

try:
    my_date = datetime.strptime(input_date, "%Y-%m-%d").date()
except ValueError:
    my_date = None

If I run this 750,000 times, it takes 19.144 seconds (determined with cProfile). Now I replace this with the following (ugly) code:

a= 1000 * int(input_date[0])
b=  100 * int(input_date[1])
c=   10 * int(input_date[2])
d=    1 * int(input_date[3])
year = a+b+c+d

c=   10 * int(input_date[5])
d=    1 * int(input_date[6])
month = c+d

c=   10 * int(input_date[8])
d=    1 * int(input_date[9])
day = c+d

try:
    my_date = date(year, month, day)
except ValueError:
    my_date = None

If I run this 750,000 times, it only takes 5.946 seconds. However, I find the code really ugly. Is there another fast way to extract a date from a string, without using strptime?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
physicalattraction
  • 6,485
  • 10
  • 63
  • 122
  • Use `timeit` to do time trials, not `cProfile`. I am not saying that the cards will fall differently, but it'll certainly be more accurate. – Martijn Pieters Jul 08 '14 at 07:50
  • 1
    Why not e.g. `year = int(input_date[:4])`? What is the `try` guarding against - invalid formats may fail on the indexing. – jonrsharpe Jul 08 '14 at 07:51
  • @martijn: The reason I'musing cProfile: I need approximate results of all my methods, not just this one. – physicalattraction Jul 08 '14 at 07:58
  • @physicalattraction: but **in this post** you are talking about this one. If you want to run time trials to compare approaches to a single task, use `timeit`. – Martijn Pieters Jul 08 '14 at 08:01
  • 1
    @physicalattraction: and for the record: strptime is indeed slower (by about 2x) than your ugly approach, because it does much more validation on the input. It can handle months and days that are not zero-padded, for example. – Martijn Pieters Jul 08 '14 at 08:01
  • @jon: The year = int(input_date[:4]) is indeed a good tip, I am using it now. It even speeds it up more (to 3.3 seconds). I guess I have to place those commands within the try as well. – physicalattraction Jul 08 '14 at 08:02

1 Answers1

6

Yes, there are faster methods to parse a date than datetime.strptime(), if you forgo a lot of flexibility and validation. strptime() allows both numbers with and without zero-padding, and it only matches strings that use the right separators, whilst your 'ugly' version doesn't.

You should always use the timeit module for time trials, it is far more accurate than cProfile here.

Indeed, your 'ugly' approach is twice as fast as strptime():

>>> from datetime import date, datetime
>>> import timeit
>>> def ugly(input_date):
...     a= 1000 * int(input_date[0])
...     b=  100 * int(input_date[1])
...     c=   10 * int(input_date[2])
...     d=    1 * int(input_date[3])
...     year = a+b+c+d
...     c=   10 * int(input_date[5])
...     d=    1 * int(input_date[6])
...     month = c+d
...     c=   10 * int(input_date[8])
...     d=    1 * int(input_date[9])
...     day = c+d
...     try:
...         my_date = date(year, month, day)
...     except ValueError:
...         my_date = None
... 
>>> def strptime(input_date):
...     try:
...         my_date = datetime.strptime(input_date, "%Y-%m-%d").date()
...     except ValueError:
...         my_date = None
... 
>>> timeit.timeit('f("2014-07-08")', 'from __main__ import ugly as f')
4.21576189994812
>>> timeit.timeit('f("2014-07-08")', 'from __main__ import strptime as f')
9.873773097991943

Your approach can be improved upon though; you could use slicing:

>>> def slicing(input_date):
...     try:
...         year = int(input_date[:4])
...         month = int(input_date[5:7])
...         day = int(input_date[8:])
...         my_date = date(year, month, day)
...     except ValueError:
...         my_date = None
... 
>>> timeit.timeit('f("2014-07-08")', 'from __main__ import slicing as f')
1.7224829196929932

Now it is almost 6 times faster. I also moved the int() calls into the try - except to handle invalid input when converting strings to integers.

You could also use str.split() to get the parts, but that makes it slightly slower again:

>>> def split(input_date):
...     try:
...         my_date = date(*map(int, input_date.split('-')))
...     except ValueError:
...         my_date = None
... 
>>> timeit.timeit('f("2014-07-08")', 'from __main__ import split as f')
2.294667959213257
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343