3

I have a dataframe with 3 columns, one for hour, one for minute, and one for second, like this:

df = pd.DataFrame({'hour': [9.0, 9.0, 9.0, 10.0],
                   'min': [12.0, 13.0, 55.0, 2.0],
                   'sec': [42.0, 30.0, 12.0, 5.0]})

>>> df
   hour   min   sec
0   9.0  12.0  42.0
1   9.0  13.0  30.0
2   9.0  55.0  12.0
3  10.0   2.0   5.0

I'm trying to combine the three columns into a new column made up of a datetime series. The goal would be to have this dataframe:

   hour   min   sec      time
0   9.0  12.0  42.0   9:12:42
1   9.0  13.0  30.0   9:13:30
2   9.0  55.0  12.0   9:55:12
3  10.0   2.0   5.0  10:02:05

So far I'm trying to use pd.to_datetime, as such:

df['time'] = pd.to_datetime(df[['hour', 'min', 'sec']],
                        format = '%H:%M:S')

But I'm getting the following ValueError: ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing.

I was trying to avoid this by including the format argument with only hour minute second, but apparently that doesn't work.

A similar question was asked here, but the solutions proposed do not seem to work in this case, I'm still getting this ValueError

Any ideas to solve this would be appreciated!

Thanks!

[EDIT]: I also needed the method to be able to deal with NaNs, so a dataframe such as this:

df = pd.DataFrame({'hour': [9.0, 9.0, 9.0, 10.0, np.nan],
                   'min': [12.0, 13.0, 55.0, 2.0, np.nan],
                   'sec': [42.0, 30.0, 12.0, 5.0, np.nan]})

The solution proposed by @PiRSquared works

sacuL
  • 49,704
  • 8
  • 81
  • 106

2 Answers2

4

Not sure if there is a more direct way but this works

df['time'] = pd.to_datetime(df['hour'].astype(int).astype(str)+':'+df['min'].astype(int).astype(str)+':'+df['sec'].astype(int).astype(str), format = '%H:%M:%S').dt.time


    hour    min     sec     time
0   9.0     12.0    42.0    09:12:42
1   9.0     13.0    30.0    09:13:30
2   9.0     55.0    12.0    09:55:12
3   10.0    2.0     5.0     10:02:05
Vaishali
  • 37,545
  • 5
  • 58
  • 86
  • Thanks @Vaishali, Yeah, I was trying something similar, but your way works for this example. Seems like there must be a more efficient way without converting to int and back to string. The issue with this is if my full dataframe has NaNs (which it does), it can't be converted to int. But I suppose I could just get rid of them if needed – sacuL Jan 09 '18 at 21:31
  • pd.to_datetime(df.apply(lambda x: ":".join([str(int(v)) for v in x]), axis=1), format="%H:%M:%S").dt.time <- how about this? – Tai Jan 09 '18 at 21:43
  • @Tai, I wouldn't use apply for something that can be done using vectorized operations – Vaishali Jan 09 '18 at 21:46
  • @Vaishali Thanks for the comments! Appreciate it. – Tai Jan 09 '18 at 21:47
2

We can use pd.to_datetime on a dataframe with the requisite column names to create a series of datetimes.

However, OPs initial dataframe has a 'min' column that needs to be renamed 'minute' and a 'sec' column that needs to be renamed 'second'.

In addition, I'll add the missing columns 'year', 'month', and 'day' using pd.DataFrame.assign.

Finally, I'll add the 'time' column with pd.DataFrame.assign again.

new = dict(year=2017, month=1, day=1)
rnm = dict(min='minute', sec='second')
df.assign(
    time=pd.to_datetime(
        df.rename(columns=rnm).assign(**new)
    ).dt.time
)

   hour   min   sec      time
0   9.0  12.0  42.0  09:12:42
1   9.0  13.0  30.0  09:13:30
2   9.0  55.0  12.0  09:55:12
3  10.0   2.0   5.0  10:02:05
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks @piRSquared ! This works, and deals fine with NaNs (which I have in my actual dataframe). – sacuL Jan 09 '18 at 21:38