0

I am trying to

  1. get the average wakeup time from a list of wake up times in a df column and
  2. compare them to a goal wake up time (earlier or later)
import pandas as pd

#convert to datetime object
df['wakeup_time_date']=pd.to_datetime(df['wakeup_time_date'], infer_datetime_format=True)

#extract time
df['wakeup_time']=df['wakeup_time_date'].dt.time

#below doesn't work as the object is no longer a datetime object
df['wakeup_time'].mean()

And lastly, I would like to compare the average wakeup time to 06:30 am and determine it if it earlier or later.

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
5tanczak
  • 161
  • 1
  • 2
  • 8
  • 1
    don't strip the date by calling `.dt.time`, just ignore it as shown here: [Mean of time component only of pandas datetime column](https://stackoverflow.com/questions/50356337/mean-of-time-component-only-of-pandas-datetime-column) – FObersteiner Jan 08 '22 at 18:46
  • That gets the mean time. However I'm still unsure how to solve the second part of the task: compare that time to a goal time. – 5tanczak Jan 08 '22 at 18:58

2 Answers2

1

As MrFuppes suggested, probably the easiest way would be converting to datetime.timedeltas, instead of datetime.times:

from datetime import datetime, timedelta, time
df = pd.DataFrame({"wakeup_time_date": ["2019/03/04 07:08:58", "2019/03/05 08:08:48", "2019/03/06 10:00:12", "2019/03/10 6:35:32"]})

df['wakeup_time_date']=pd.to_datetime(df['wakeup_time_date'])
df['time'] = df['wakeup_time_date'] - df['wakeup_time_date'].dt.normalize()
mean_td = df['time'].mean()

#Timedelta('0 days 07:58:22.500000')

Now, if you don't mind operating on timedeltas, you might compare it straight away:

goal_td = timedelta(hours=6, minutes=30) 
is_mean_later_than_goal = mean_td > goal_td
# True

or convert it to datetime.time and do the comparison this way:

mean_time = (datetime.min + mean_td).time()
goal_time = time(hour=6, minute=30)
is_mean_later_than_goal = mean_time > goal_time
# True
Daniel Wlazło
  • 1,105
  • 1
  • 8
  • 17
1

Here is a toy example using np.where() in which True rows have greater than 390 minutes (6:30 AM) and less than 720 (noon, arbitrability chosen for example). First, a minutes column is created using dt attributes.

It's hard to say without seeing your data, but if you have "a list of wake up times in a df column" I would likely set a DatetimeIndex and then do a groupby aggregated by mean time on it before the np.where() step.

from pandas import Timestamp
import pandas as pd

toy_dict = {'wakeup_time_date': {0: Timestamp('2021-11-11 13:04:44.342843'),
  1: Timestamp('2021-11-11 15:43:40.654980'),
  2: Timestamp('2021-11-11 17:31:07.409101'),
  3: Timestamp('2021-11-11 22:19:14.394725'),
  4: Timestamp('2021-11-12 09:03:41.957619'),
  5: Timestamp('2021-11-12 09:46:24.849156'),
  6: Timestamp('2021-11-12 10:45:07.950917'),
  7: Timestamp('2021-11-12 12:27:48.189281'),
  8: Timestamp('2021-11-12 12:57:27.112832'),
  9: Timestamp('2021-11-12 14:06:17.432817')}}


df = pd.DataFrame(toy_dict)

df['minutes'] = df['wakeup_time_date'].dt.hour * 60 + df['wakeup_time_date'].dt.minute + df['wakeup_time_date'].dt.second/60

df['past_six_thirty'] = np.where((df['minutes'] > 390) & (df['minutes'] < 720), True, False)

print(df)

    wakeup_time_date            minutes     past_six_thirty
0   2021-11-11 13:04:44.342843  784.733333  False
1   2021-11-11 15:43:40.654980  943.666667  False
2   2021-11-11 17:31:07.409101  1051.116667 False
3   2021-11-11 22:19:14.394725  1339.233333 False
4   2021-11-12 09:03:41.957619  543.683333  True
5   2021-11-12 09:46:24.849156  586.400000  True
6   2021-11-12 10:45:07.950917  645.116667  True
7   2021-11-12 12:27:48.189281  747.800000  False
8   2021-11-12 12:57:27.112832  777.450000  False
9   2021-11-12 14:06:17.432817  846.283333  False
Matthew Borish
  • 3,016
  • 2
  • 13
  • 25