Pandas has some promising sounding functions:
pandas.DataFrame.between_time and pandas.DatetimeIndex.indexer_between_time
Unfortunately, between_time
returns a DataFrame, and not a boolean Series. So it's not so convenient to use with .loc
.
And indexer_between_time
returns an integer array of index positions. Works with .iloc
, but that is quite uncomfortable to us.
Furthermore both of them require the index to be of DateTime type.
First, some example data:
df = pd.DataFrame(pd.date_range(start = "2020-11-19 00:00",
end = "2020-11-19 23:59",
periods = 15),
columns = ["start"])
start
0 2020-11-19 00:00:00
1 2020-11-19 01:42:47
2 2020-11-19 03:25:34
3 2020-11-19 05:08:21
4 2020-11-19 06:51:08
5 2020-11-19 08:33:55
6 2020-11-19 10:16:42
7 2020-11-19 11:59:30
8 2020-11-19 13:42:17
9 2020-11-19 15:25:04
10 2020-11-19 17:07:51
11 2020-11-19 18:50:38
12 2020-11-19 20:33:25
13 2020-11-19 22:16:12
14 2020-11-19 23:59:00
Adding a new column, which will show with True / False if a row is Daytime
or not.
df["Daytime"] = False
Setting the index to start, the DateTime column:
df = df.set_index("start")
Daytime
start
2020-11-19 00:00:00 False
2020-11-19 01:42:47 False
2020-11-19 03:25:34 False
2020-11-19 05:08:21 False
2020-11-19 06:51:08 False
2020-11-19 08:33:55 False
2020-11-19 10:16:42 False
2020-11-19 11:59:30 False
2020-11-19 13:42:17 False
2020-11-19 15:25:04 False
2020-11-19 17:07:51 False
2020-11-19 18:50:38 False
2020-11-19 20:33:25 False
2020-11-19 22:16:12 False
2020-11-19 23:59:00 False
Which are your boundaries for a timestamp to be DayTime?
DayStart = "06:30:00"
DayEnd = "18:00:00"
Creating the integer array of matching rows. You can also set include_start
and include_end
to have open or closed intervals.
DayTime = df.index.indexer_between_time(DayStart, DayEnd)
What do we get in return? A list of integers that match the index position.
>>> array([ 4, 5, 6, 7, 8, 9, 10])
We can now use that to set the 0th column to True
:
df.iloc[DayTime,0] = True
Daytime
start
2020-11-19 00:00:00 False
2020-11-19 01:42:47 False
2020-11-19 03:25:34 False
2020-11-19 05:08:21 False
2020-11-19 06:51:08 True
2020-11-19 08:33:55 True
2020-11-19 10:16:42 True
2020-11-19 11:59:30 True
2020-11-19 13:42:17 True
2020-11-19 15:25:04 True
2020-11-19 17:07:51 True
2020-11-19 18:50:38 False
2020-11-19 20:33:25 False
2020-11-19 22:16:12 False
2020-11-19 23:59:00 False
Using the between_time
function returns a DataFrame matching the criterion:
df_DayFilter = df.between_time(DayStart, DayEnd)
Daytime
start
2020-11-19 06:51:08 True
2020-11-19 08:33:55 True
2020-11-19 10:16:42 True
2020-11-19 11:59:30 True
2020-11-19 13:42:17 True
2020-11-19 15:25:04 True
2020-11-19 17:07:51 True
I'd really like to know as well if there is a more elegant way to use between_time
!