0

I am trying to analyze the Open Food Facts Dataset. The Dataset is very messy! I am currently trying to convert the 'created_datetime' object to a Python datetime object. The entries in this column all look like "2017-03-09T10:34:11Z". I have tried the normal to_datetime method with and without "infer_datetime_format". Neither of them worked. I then changed to format of the string to look like this "2017 03 09 10 34 12" (I deleted the "Z" and "T" and added a space between every number). I then tried to run to_datetime again setting the format to '%Y %m %d %I M% %S'. But no matter what I do, I get an error. In this case, I get: "'%' is a bad directive in format '%Y %m %d %I M% %S'". Could someone please help me out? I would really appreciate it!

Marcus
  • 105
  • 1
  • 2
  • 8
  • 1
    Post some of the data or a link to the dataset so that others can help! – Alex Aug 06 '18 at 19:59
  • That string works with that format. If it's working on some of your strings but not others, the best guess is that it's because you used `%I` (12-hour clock hour) instead of `%H` (24-hour clock hour), but we'd need to see an actual example of it failing to do more than guess. – abarnert Aug 06 '18 at 20:02
  • 1
    Welcome to Stackoverflow. Please include some sample data and the code of what you tried before. For guidance please check the [how to ask](https://stackoverflow.com/help/how-to-ask) page and [how to create a minimal example](https://stackoverflow.com/help/mcve). – 5th Aug 06 '18 at 20:04
  • But meanwhile, why are you changing the string? Why not just use a format that matches your string, like `%Y-%m-%dT%H:%M:%SZ`? – abarnert Aug 06 '18 at 20:04
  • I dont understand what you want to do ? If you want normal datetime, then why not use pandas.to_datetime method () ? – user96564 Aug 06 '18 at 20:06

4 Answers4

3
import pandas as pd
pd_date_time = pd.to_datetime("2017-03-09T10:34:11Z")
py_date_time = pd_date_time.to_pydatetime()
Karan Bhagat
  • 329
  • 1
  • 6
1

If you remove the "Z" and "T" such as in my example below you can use datetime.strptime to convert to datetime.

from datetime import datetime
messy_date = "2017-03-09 10:34:11"
messy_date = messy_date.replace('T',' ')
messy_date = messy_date.replace('Z',' ')
datetime_object = datetime.strptime(messy_date, '%Y-%m-%d %I:%M:%S')
print(datetime_object)

Output is: 2017-03-09 10:34:11

Jen
  • 635
  • 5
  • 9
  • 3
    You don't have to remove the `Z` or `T` to use `strptime`; you can just put them in the format string. – abarnert Aug 06 '18 at 20:05
  • I just tried that... data['created_datetime'] = datetime.strptime(data['created_datetime'], '%Y-%m-%d %I:%M:%S') i get the following error :strptime() argument 1 must be str, not Series – Marcus Aug 06 '18 at 21:14
  • It is not working because you have to iterate through all values. Try this: data['created_datetime'] = data['created_datetime'].map(lambda a: datetime.strptime(a, '%Y-%m-%dT%I:%M:%SZ')). Let me know if you have any other questions. – Jen Aug 06 '18 at 21:49
0

I don't know what types of objects you have, so I don't know what the to_datetime function does.

But functions like datetime.datetime.strptime or anything from Pandas will work with exactly the example you gave. But they won't work with a slightly different example, like this:

2017 03 09 13 34 12

Why? Because you're using %I, which means 12-hour-clock hour, instead of %H, which means 24-hour-clock hour, and 13 isn't a valid value for %I.

But really, why change the string in the first place? Why not just use a format that matches your string as it is?

>>> s = '2017-03-09T10:34:12Z'
>>> datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%SZ')
datetime.datetime(2017, 3, 9, 10, 34, 12)

Or, if you're using one of the Pandas functions, or something out of dateutil, they should just work by default:

>>> pd.to_datetime("2017-03-09T10:34:12Z")
Timestamp('2017-03-09 10:34:12')
>>> dateutil.parser.parse("2017-03-09T10:34:12Z")
datetime.datetime(2017, 3, 9, 10, 34, 12, tzinfo=tzutc())

(Notice that dateutil even recognized the Z as meaning we explicitly want a UTC timestamp, rather than a naive timestamp.)

abarnert
  • 354,177
  • 51
  • 601
  • 671
-1

Have you tried regex? Or even simple yr, mo, day, hr, min, sec = datetime.split(" ") should work.

Mr.cysl
  • 1,494
  • 6
  • 23
  • 37