1

I've got a dataframe called base_varlist2 that has some date columns, all as a string in varying formats. Here's a quick snapshot of the first column:

In [27]: print(base_varlist2.iloc[0])
completion_date_latest               07/10/2004
completion_date_original             17/05/1996
customer_birth_date_1                04/02/1963
customer_birth_date_2                       NaN
d_start                               01Feb2018
latest_maturity_date                 01/03/2027
latest_valuation_date                08/05/2004
sdate                                       NaN
startdt_def                                 NaN

As you can see, some of these columns are empty, some aren't.

I need a way of dynamically changing these to datetime values, regardless of their existing string format. Is there a way to do this from the dataframe, without specifying a specific format for each column? For example, I won't always know if d_start will be in %d%b%Y format.

The dataframe was made from the original CSV which is much larger, and contains lots of other columns (that aren't dates), like text fields and numeric fields. I'm using a read_csv method to bring that in, as below:

source_data = pd.read_csv(loc + 'TEST_FILE.csv')

Aaraeus
  • 1,105
  • 3
  • 14
  • 26

2 Answers2

2

Pandas'read_csv function supports parse_dates parameter, which can be a list of column names for which you want a conversion from str to datetime.

Alternatively, you can always convert a series using pandas.to_datetime, e.g.:

df['date'] = pandas.to_datetime(df['date'])

Notice that pandas makes no assumption on the format you used for dates.

Guybrush
  • 2,680
  • 1
  • 10
  • 17
  • So I tried this by looping through my columns and creating a new dataframe, with all the columns converted to datetime, but I'm getting an error when I try to view the resultant dataset. When I submit `print(mm_dates_base.iloc[0])`, I get `ValueError: cannot convert float NaN to integer`. – Aaraeus Apr 10 '18 at 12:52
  • That's weird, and probably not directly related to the question though. – Guybrush Apr 10 '18 at 12:58
  • You're right. I'll mark this down as complete but I've raised another question about this problem here: https://stackoverflow.com/questions/49756649/pandas-to-datetime-not-working-for-null-values – Aaraeus Apr 10 '18 at 14:55
1

Unfortunately this does not working on fly, need specify each column for parse to dates, docs:

source_data = pd.read_csv(loc + 'TEST_FILE.csv', parse_dates=['date1','date2','date6'])
#or specify columns by positions
source_data = pd.read_csv(loc + 'TEST_FILE.csv', parse_dates=[0,1,6])

In my opinion reason is performance and also avoid converting numeric values like 2000 to datetimes.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • How can I do this in a case insensitive way? I have my list of date columns, but their names are formatted slightly differently in the CSV file... – Aaraeus Apr 10 '18 at 12:31
  • @Aaraeus - I think not, then try second solution for seelct by positions. – jezrael Apr 10 '18 at 12:32