1

I want to combine date and time data in multiple columns of a Pandas Dataframe to create a DatetimeIndex. I have this:

In:  
import pandas as pd
df = pd.DataFrame({'the_date':['2020-03-26', '2020-03-26', '2020-03-25','2020-03-25'],
                   'hour': [1,2,1,2],
                   'data': [4,5,6,7]})
df

Out:    
the_date    hour    data
0   2020-03-26  1   4
1   2020-03-26  2   5
2   2020-03-25  1   6
3   2020-03-25  2   7


df['ts'] = df.apply(lambda row: pd.to_datetime(row['the_date'] + " " + str(row['hour']) +":00:00"))
df = df.set_index('ts')

But I'm getting this error:

KeyError: ('the_date', 'occurred at index the_date') What am I doing wrong?

doctorer
  • 1,672
  • 5
  • 27
  • 50

2 Answers2

1

You can avoid loops (apply are loops under the hood) by to_timedelta:

df['the_date'] = pd.to_timedelta(df['hour'], unit='H') + pd.to_datetime(df['the_date'])  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

This is quite a common mistake, actually!

The default axis of pandas.DataFrame.apply is 0, i.e. the lambda function is applied to each column (and therefore, you can't take the values of other columns into account for your calculations). What you want is to change the axis to 1, i.e. apply it to each row:

df['ts'] = df.apply(lambda row: pd.to_datetime(row['the_date'] + " " + str(row['hour']) +":00:00"), axis=1)
hmhmmm
  • 333
  • 1
  • 4