datetime dtypes in pandas read_csv

Question

I'm reading in a csv file with multiple datetime columns. I'd need to set the data types upon reading in the file, but datetimes appear to be a problem. For instance:

headers = ['col1', 'col2', 'col3', 'col4']
dtypes = ['datetime', 'datetime', 'str', 'float']
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

When run gives a error:

TypeError: data type "datetime" not understood

Converting columns after the fact, via pandas.to_datetime() isn't an option I can't know which columns will be datetime objects. That information can change and comes from whatever informs my dtypes list.

Alternatively, I've tried to load the csv file with numpy.genfromtxt, set the dtypes in that function, and then convert to a pandas.dataframe but it garbles the data. Any help is greatly appreciated!

firelynx · Answer 1 · 2018-03-23T09:49:49.067

Why it does not work

There is no datetime dtype to be set for read_csv as csv files can only contain strings, integers and floats.

Setting a dtype to datetime will make pandas interpret the datetime as an object, meaning you will end up with a string.

Pandas way of solving this

The pandas.read_csv() function has a keyword argument called parse_dates

Using this you can on the fly convert strings, floats or integers into datetimes using the default date_parser (dateutil.parser.parser)

headers = ['col1', 'col2', 'col3', 'col4']
dtypes = {'col1': 'str', 'col2': 'str', 'col3': 'str', 'col4': 'float'}
parse_dates = ['col1', 'col2']
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes, parse_dates=parse_dates)

This will cause pandas to read col1 and col2 as strings, which they most likely are ("2016-05-05" etc.) and after having read the string, the date_parser for each column will act upon that string and give back whatever that function returns.

Defining your own date parsing function:

The pandas.read_csv() function also has a keyword argument called date_parser

Setting this to a lambda function will make that particular function be used for the parsing of the dates.

GOTCHA WARNING

You have to give it the function, not the execution of the function, thus this is Correct

date_parser = pd.datetools.to_datetime

This is incorrect:

date_parser = pd.datetools.to_datetime()

Pandas 0.22 Update

pd.datetools.to_datetime has been relocated to date_parser = pd.to_datetime

Thanks @stackoverYC

This is a slow solution. See this instead: https://stackoverflow.com/questions/29882573/pandas-slow-date-conversion — Reddspark, Sep 12 '17 at 14:52
@user1761806 Hey good find! I made a better one though. https://stackoverflow.com/a/46183514/3730397 — firelynx, Sep 12 '17 at 18:50
On pandas 0.22.0 says `pandas.core.datetools.to_datetime` is deprecated, use `pd.datetools.to_datetime` instead. like this: `date_parser = pd.to_datetime` — stackoverYC, Mar 23 '18 at 08:41
There's also a `converters` parameter where you can specify which columns have which converters. parse_dates is helpful and handles bad data but is slower due to it testing and inferring each value https://gist.github.com/gjreda/7433f5f70299610d9b6b — Davos, Apr 05 '18 at 07:42
Don't forget to add `skiprows=1` if you already have a header in your file. — d_-, Sep 27 '20 at 09:08

score 71 · Answer 2 · edited Feb 20 '20 at 19:20

71

There is a parse_dates parameter for read_csv which allows you to define the names of the columns you want treated as dates or datetimes:

date_cols = ['col1', 'col2']
pd.read_csv(file, sep='\t', header=None, names=headers, parse_dates=date_cols)

edited Feb 20 '20 at 19:20

Asclepius

57,944
17
167
143

answered Feb 20 '18 at 22:38

mrjrdnthms

1,549
4
24
35

3

I was having error as I was passing single string name of column, now I understand that I needed to pass list for a single value also. – TapanHP Jul 12 '19 at 10:40
how do you use dtype to define non-date columns whilst using parse_dates for date columns? – James Apr 06 '22 at 20:59

score 23 · Answer 3 · answered Jan 21 '14 at 23:19

You might try passing actual types instead of strings.

import pandas as pd
from datetime import datetime
headers = ['col1', 'col2', 'col3', 'col4'] 
dtypes = [datetime, datetime, str, float] 
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

But it's going to be really hard to diagnose this without any of your data to tinker with.

And really, you probably want pandas to parse the the dates into TimeStamps, so that might be:

pd.read_csv(file, sep='\t', header=None, names=headers, parse_dates=True)

score 8 · Answer 4 · answered Jun 16 '21 at 05:47

8

I used the following code and it worked:

headers = ['col1', 'col2', 'col3', 'col4']
df=pd.read_csv(file, sep='\t', header=None, names=headers, parse_dates=['col1', 'col2'])

answered Jun 16 '21 at 05:47

Amin Alipour

81
1
1

2

Add context as to why this worked for you would help other users understand your answer in a better way. – Jimit Vaghela Jun 16 '21 at 08:42
yes, but did this enforce col3-str and col4=float? – James Apr 06 '22 at 20:58

score 7 · Answer 5 · answered Apr 07 '21 at 03:05

7

My workaround was to load as its default type, then use pandas.to_datetime() function one line down.

df[target_col] = pd.to_datetime(df[target_col])

answered Apr 07 '21 at 03:05

Greg

71
1
1

score 1 · Answer 6 · answered Jan 02 '15 at 19:48

1

I tried using the dtypes=[datetime, ...] option, but

import pandas as pd
from datetime import datetime
headers = ['col1', 'col2', 'col3', 'col4'] 
dtypes = [datetime, datetime, str, float] 
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

I encountered the following error:

TypeError: data type not understood

The only change I had to make is to replace datetime with datetime.datetime

import pandas as pd
from datetime import datetime
headers = ['col1', 'col2', 'col3', 'col4'] 
dtypes = [datetime.datetime, datetime.datetime, str, float] 
pd.read_csv(file, sep='\t', header=None, names=headers, dtype=dtypes)

answered Jan 02 '15 at 19:48

Jose Buraschi

499
3
3

4

This will still make the dtype of the resulting dataframe an object, not a pandas.datetime – firelynx May 26 '16 at 07:22
19

Aside from the fact that this doesn't have the desired effect, it also doesn't work: `AttributeError: type object 'datetime.datetime' has no attribute 'datetime'` – Gabriel Dec 01 '16 at 09:29

datetime dtypes in pandas read_csv

6 Answers6

Why it does not work

Pandas way of solving this

Defining your own date parsing function:

GOTCHA WARNING

Pandas 0.22 Update

Linked

Related