0

I am new here, I apologize if the layout is not typical.

I have a large csv with around 150,000 rows. The Date column alternates between two formats, but it always increases in increments of 10 minutes.

First 2001-02-08 2:30:00 AM -- %Y-%m-%d %I:%M:%S %p

Second 01/13/08 00:20:00 -- %y/%d/%m %H:%M:%

I have a bunch of ways, one is to parse as I read_csv and second is to use pd.to_dateime. But I keep getting an erroneous output.

The attached picture is the output from

df1['Test'] = pd.to_datetime(df1['Date'],yearfirst=True)

the year in the output should not be 2008

Output

Format1 Fortmat2

TAbbs
  • 1
  • 1

1 Answers1

0

Parse the dates both ways and combine the results:

df1['Test'] = pd.to_datetime(df['A'], format="%y/%d/%m %H:%M:%S", 
                                    errors='coerce')\
                .combine_first(pd.to_datetime(df['A']))

The call to the first to_datetime with the option errors='coerce' will produce a NaT ("not-a-time") if the date is in the wrong format. Those NaTs will be filled by the values produced by the second call to to_datetime.

DYZ
  • 55,249
  • 10
  • 64
  • 93
  • Thank you very much!! I had like 30 tabs open on how to do this. I ran it without the combine and it worked! df1['Test'] = pd.to_datetime(df1['Date'], format="%y/%d/%m %H:%M:%S", errors='coerce') – TAbbs Jul 08 '18 at 22:12