0

I'm having some trouble getting the following string converted to a datetime object using Python. I have a large csv file (over 10k lines) and I need to transform a column of dates from the following format:

Jun 1, 2020 12:11:49 AM PDT

to:

06/01/20

My first thought was to use datetime.strptime, which requires passing in the string and the date format it is in, because then I can just reformat one date type to another real easy. The problem I'm having is I don't know how to represent this string as a date format, mostly due to the timezone.

My best guess for the date format I need is '%mmm %dd, %yyyy %H:%M:%S %aa' but I can't figure out how to represent the timezone here (and I'm also not sure about AM/PM being %aa).

I've tried looking at other threads but they all seem to have easily match-able strings.

Thanks!

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
Chowdahhh
  • 117
  • 10
  • As you're only mentioning parsing out the date portion, do you care about the time or the timzone? Couldn't you just ignore everything after the year, or will you need the time portion later? – BowlOfRed Aug 05 '20 at 20:33
  • Yeah I'd be fine cutting out the time and timezone completely, I just assumed I needed to convert the whole thing before doing that. Is there a way to just ignore everything after the year in the conversion to the datetime object? – Chowdahhh Aug 05 '20 at 20:47

3 Answers3

1

The format is documented in the following table, in particular, AM/PM is %p and timezone is %Z:

https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes

However, in your case, I would suggest not to bother with the parsing at all but rely on dateutil to do the parsing. It is more flexible as it can figure out the correct format almost always.

adrtam
  • 6,991
  • 2
  • 12
  • 27
0

I'd be fine cutting out the time and timezone completely

Then you have lots of choices. As already mentioned, dateutil is cool and would work great. But if you wanted to stay in datetime for some reason you could:

  • Parse the whole thing, but know that the timezone is ignored

Datetime/strptime can parse the whole thing, but doesn't really understand/convert timezones. If you do this, it will just parse it as UTC.

>>> str(datetime.strptime("Jun 1, 2020 12:11:49 AM PDT", "%b %d, %Y %I:%M:%S %p %Z"))
'2020-06-01 00:11:49'

You could also throw away the time portion before handing it to strptime(), but that's probably more trouble than it's worth given the other options.


Oops. I didn't realize that %Z will only parse certain timezones (that probably depend on your machine). So if you can't control that, it's not going to work. On my machine 'PDT' will parse and 'EDT' will fail.

Given that, I'd throw away the timezone. If it's always in this format, then maybe something like:

>>> ts = "Jun 1, 2020 12:11:49 AM PDT"
>>> str(datetime.strptime(ts.rpartition(" ")[0], "%b %d, %Y %I:%M:%S %p"))
'2020-06-01 00:11:49'
BowlOfRed
  • 339
  • 2
  • 11
  • I just tried using dateutil but I got the following warning: UnknownTimezoneWarning: tzname PDT identified but not understood. Pass `tzinfos` argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception. I did 'from dateutil.parser import *' and then to parse it I just did 'day = parse(string)' – Chowdahhh Aug 05 '20 at 21:16
  • Also, I tried using strptime as you said but got an error: ValueError: time data 'Jun 1, 2020 12:00:26 AM PDT' does not match format '%b %d, %Y %I:%M:%S %p %Z' – Chowdahhh Aug 05 '20 at 21:19
  • Yes, I didn't realize that it only works with certain strings (even though it's not actually converting on the timezone). – BowlOfRed Aug 05 '20 at 23:39
0

As @adrtam already suggested, you can use dateutil's parser to conveniently parse such a string. to correctly parse the time zone, you can supply it with a mapping dict:

from dateutil import parser, tz

s = 'Jun 1, 2020 12:11:49 AM PDT'

tzmapping = {'PDT': tz.gettz('US/Pacific')} # assuming PDT means Pacific daylight saving time

dt = parser.parse(s, tzinfos=tzmapping)

dt
Out[2]: datetime.datetime(2020, 6, 1, 0, 11, 49, tzinfo=tzfile('US/Pacific'))

Now you can easily format to string:

s_reformatted = dt.strftime('%m/%d/%y')

s_reformatted
Out[4]: '06/01/20'
FObersteiner
  • 22,500
  • 8
  • 42
  • 72