I have a Dataframe "timeseries" which has datetimes as its index and I have a PeriodIndex "on":
import numpy as np
import pandas as pd
timeseries = pd.DataFrame(
index=pd.DatetimeIndex(
[
"2000-01-01 12:00:00Z",
"2000-01-01 13:00:00Z",
"2000-01-01 14:00:00Z",
"2000-01-02 13:00:00Z",
"2000-01-02 18:00:00Z",
"2000-01-03 14:00:00Z",
"2000-01-03 20:00:00Z",
"2000-01-04 13:00:00Z",
]
),
data={
"value1": [6.0, 5.0, 3.0, 7.0, 4.0, 4.0, 5.0, 3.0],
},
)
on = pd.PeriodIndex(
["2000-01-01", "2000-01-02", "2000-01-04", "2000-01-05"], freq="D"
)
I would like to add a column to "timeseries" that contains the period in "on" that each respective datetime is in:
value1 period
2000-01-01 12:00:00+00:00 6.0 2000-01-01
2000-01-01 13:00:00+00:00 5.0 2000-01-01
2000-01-01 14:00:00+00:00 3.0 2000-01-01
2000-01-02 13:00:00+00:00 7.0 2000-01-02
2000-01-02 18:00:00+00:00 4.0 2000-01-02
2000-01-03 14:00:00+00:00 4.0 NaN
2000-01-03 20:00:00+00:00 5.0 NaN
2000-01-04 13:00:00+00:00 3.0 2000-01-04
So far I have achieved this with a for-loop:
timeseries["period"] = np.NaN
for period in on:
datetimes_in_period = timeseries.index[
(timeseries.index >= period.start_time.tz_localize("UTC"))
& (timeseries.index <= period.end_time.tz_localize("UTC"))
]
timeseries["period"].loc[datetimes_in_period] = period
For efficiency's sake I want to avoid loops in Python. How can I vectorize this code?