3

I have a data frame with multiple columns and multiple rows. In one of these columns there are dates that take the form of mm/dd/yyyy.

I am trying to convert this using df['col'] = pd.to_datetime(df['col']) but am getting the following error because there are multiple records that have 00 in the place of a missing month or day:

ValueError: day is out of range for month

I don't want to do df['col'] = pd.to_datetime(df['col'], errors = 'coerce') because I want to keep whatever data is there.

I would like all the dates that are missing days or months or both (e.g 11/00/2018, 00/13/2018, or 00/00/2018) to have the value 01 where the value is missing (e.g 11/01/2018, 01/13/2018, 01/01/2018).

Joel
  • 1,564
  • 7
  • 12
  • 20
Priya
  • 217
  • 2
  • 9

1 Answers1

2

You could use the following regex to replace 00:

import pandas as pd
data = ['11/00/2018', '00/13/2018', '00/00/2018']

df = pd.DataFrame(data=data, columns=['col'])
replace = df['col'].replace('00/', '01/', regex=True)
result = pd.to_datetime(replace)
print(result)

Output

0   2018-11-01
1   2018-01-13
2   2018-01-01
Name: col, dtype: datetime64[ns]
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • 1
    Less susceptible to the issues of the previous answer, but I'm not a fan at all of replacing _unknown_ values with a default value here if this ever has to go forwards for analysis. Not the downvoter, but I really think some consideration needs to be made about how this could skew analysis – roganjosh Nov 13 '18 at 22:08
  • 1
    Well I believe this does answer what the OP asked or I am mistaken? – Dani Mesejo Nov 13 '18 at 22:09
  • Hence why I'm not the downvoter :) But this result is likely more harmful than just dropping the data in this case if they actually want to analyse things by date – roganjosh Nov 13 '18 at 22:10