1

I have a dataset that has a date column and a time column, I'm trying to combine them in a DateTime column but I'm facing an issue with the month & date parts of it being reversed

For example:

Date       Time
1/2/2019   3:29:59 PM
4/2/2019   9:15:59 AM

These dates are Feb 1st & Feb 4th of 2019.

When I put them in my DataFrame & format them:

data = pd.read_csv('{}/{}.csv'.format(data_path,symbol), parse_dates=[['Date','Time']])
data.columns = map(str.lower, data.columns)
data['timedelta'] = pd.Series([pd.Timedelta(seconds=59) for i in range(len(data['date_time']))])
data['date_time'] = data['date_time'] - data['timedelta']
data = data.set_index('date_time').tz_localize('Asia/Kolkata')

I get this output:

Datetime
2019-01-02 15:29:00+0530
2019-04-02 09:15:00+0530

As you can see, the DateTime object is for Jan 2nd and April 2nd of 2019.

I'd appreciate your help to figure out how to get the DateTime column formatted correctly.

rpanai
  • 12,515
  • 2
  • 42
  • 64
Abhay
  • 827
  • 9
  • 34

3 Answers3

4

First, run this df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y') to change your date format. Then, you can merge your date and time. Complete code is below:

data = pd.read_csv('{}/{}.csv'.format(data_path,symbol), parse_dates=[['Date','Time']])
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
data.columns = map(str.lower, data.columns)
data['timedelta'] = pd.Series([pd.Timedelta(seconds=59) for i in range(len(data['date_time']))])
data['date_time'] = data['date_time'] - data['timedelta']
data = data.set_index('date_time').tz_localize('Asia/Kolkata')
Nick
  • 207
  • 1
  • 2
  • 11
Darkknight
  • 1,716
  • 10
  • 23
3

Upon calling read_csv, set dayfirst=True so that the date is parsed correctly. Floor to minutes using dt.floor:

data = pd.read_csv(f'{data_path}/{symbol}.csv', parse_dates=[['Date','Time']], dayfirst=True)

data = data.set_index(data['Date_Time'].dt.floor('min')).tz_localize('Asia/Kolkata')

# need to drop col used as index separately here:
data = data.drop(['Date_Time'], axis=1)
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
1

add the following to your code after read csv:

df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')

oreopot
  • 3,392
  • 2
  • 19
  • 28