0

I am trying to clean and convert some columns in a dataframe from dtype 'object' to dtype 'datetime':

column_names = ['col a','col b', ...'col n']
df[column_names] = df[column_names].apply(pd.to_datetime, format = '%m/%Y')

But this seems to take a very long time - and right now I am only cleaning a subset of a much larger file.

Is there a quicker way to achieve this?

I note that this file opened via pd.read_csv, and even with 'parse_dates' set to True, the relevant columns are read as 'object.'

GPB
  • 2,395
  • 8
  • 26
  • 36

1 Answers1

0

For me it works perfectly:

import pandas as pd
from pandas.compat import StringIO

temp=u"""a;b;c
2/2015;4/2016;4"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep=";", parse_dates=[0,1])

print (df)
           a          b  c
0 2015-02-01 2016-04-01  4

print (df.dtypes)
a    datetime64[ns]
b    datetime64[ns]
c             int64
dtype: object

You can also define custom parser:

parser = lambda x: pd.to_datetime(x, format='%m/%Y', errors='coerce')
df = pd.read_csv(StringIO(temp), sep=";", parse_dates=[0,1], date_parser=parser)

print (df)
           a          b  c
0 2015-02-01 2016-04-01  4

print (df.dtypes)
a    datetime64[ns]
b    datetime64[ns]
c             int64
dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Not sure what the command line 'parse_dates = [0,1]' does? Your second suggestion will almost certainly take as long as my code does...unless I miss something. – GPB Jul 26 '17 at 14:06
  • It select first and second column and try convert to datetime – jezrael Jul 26 '17 at 14:07