0

My data has date variable with two different date formats

Date
01 Jan 2019
02 Feb 2019
01-12-2019
23-01-2019
11-04-2019
22-05-2019

I want to convert this string into date(YYYY-mm-dd)

Date
2019-01-01
2019-02-01
2019-12-01
2019-01-23
2019-04-11
2019-05-22

I have tried following things, but I am looking for better approach

df['Date'] = np.where(df['Date'].str.contains('-'), pd.to_datetime(df['Date'], format='%d-%m-%Y'), pd.to_datetime(df['Date'], format='%d %b %Y'))

Working solution for me

df['Date_1']= np.where(df['Date'].str.contains('-'),df['Date'],np.nan)
df['Date_2']= np.where(df['Date'].str.contains('-'),np.nan,df['Date'])
df['Date_new'] = np.where(df['Date'].str.contains('-'),pd.to_datetime(df['Date_1'], format = '%d-%m-%Y'),pd.to_datetime(df['Date_2'], format = '%d %b %Y'))
Sangram
  • 407
  • 1
  • 6
  • 18

4 Answers4

1

Just use the option dayfirst=True

pd.to_datetime(df.Date, dayfirst=True)

Out[353]:
0   2019-01-01
1   2019-02-02
2   2019-12-01
3   2019-01-23
4   2019-04-11
5   2019-05-22
Name: Date, dtype: datetime64[ns]
Andy L.
  • 24,909
  • 4
  • 17
  • 29
0

My suggestion: Define a conversion function as follows:

import datetime as dt

def conv_date(x):
    try:
        res = pd.to_datetime(dt.datetime.strptime(x, "%d %b %Y"))
    except ValueError:
        res = pd.to_datetime(dt.datetime.strptime(x, "%d-%m-%Y"))
    return res

Now get the new date column as folows:

df['Date_new'] = df['Date'].apply(lambda x: conv_date(x))
bexi
  • 1,186
  • 5
  • 9
0

This works simply as expected -

import pandas as pd

a = pd. DataFrame({
        'Date' : ['01 Jan 2019',
                '02 Feb 2019',
                '01-12-2019',
                '23-01-2019',
                '11-04-2019',
                '22-05-2019']
    })
a['Date'] = a['Date'].apply(lambda date: pd.to_datetime(date, dayfirst=True))

print(a)
Mobasshir Bhuiya
  • 954
  • 6
  • 20
0

You can get your desired result with the help of apply AND to_datetime method of pandas, as given below:-

import pandas pd

def change(value):
    return pd.to_datetime(value)

df = pd.DataFrame(data = {'date':['01 jan 2019']})

df['date'] = df['date'].apply(change)
df

I hope it may help you.

Rahul charan
  • 765
  • 7
  • 15
  • It will fail at fourth observation. It will give 4th Nov 2019 instead of 11th Apr 2019 – Sangram Jun 28 '19 at 10:01
  • @Sangram `pd.to_datetime` takes input in the format of `DD:MM:YY`, According this it convert the value. If this is your `4th Nov 2019` then you have to enter date as `04-11-2019`. – Rahul charan Jun 28 '19 at 10:06
  • I can understand but I can not change this manually since I am getting data in dd-mm-YYYY format or other format which I mentioned in question. – Sangram Jun 28 '19 at 10:11