I have been fiddling about with pandas.DataFrame.rolling
for some time now and I haven't been able to achieve the result that I am looking for, so before I write a custom windowing function I figured I would ask if I'm missing something.
I have postgresql data with a composite index of (time, node)
that has been read into a pandas.DataFrame
, where time
is a certain hour on a certain date. I need to create windows that contain all entries within the last two calendar dates (or any arbitrary number of days), for example, beginning at 2022-12-26 00:00:00
and ending on 2022-12-27 23:00:00
, and then perform operations on that window to return a new, resultant DataFrame. The window should then move forward an entire calendar date, which is where I am failing.
| time | node | value |
| --------------------- | ----- | ------ |
| 2022-12-26 00:00:00 | 123 | low |
| 2022-12-26 01:00:00 | 123 | med |
| 2022-12-26 02:00:00 | 123 | low |
| 2022-12-26 03:00:00 | 123 | high |
| ... | ... | ... |
| 2022-12-26 00:00:00 | 999 | low |
| 2022-12-26 01:00:00 | 999 | low |
| 2022-12-26 02:00:00 | 999 | low |
| 2022-12-26 03:00:00 | 999 | med |
| ... | ... | ... |
| 2022-12-27 00:00:00 | 123 | low |
| 2022-12-27 01:00:00 | 123 | med |
| 2022-12-27 02:00:00 | 123 | low |
| 2022-12-27 03:00:00 | 123 | high |
When I use something akin to df.rolling(window=pd.Timedelta('2days')
, the windows move forward hour-by-hour, as opposed to beginning on the next calendar date.
I've played around with using min_periods
, but it doesn't seem to work with my data, nor would it be acceptable in the long run because the number of expected observations per window is not fixed regardless. The step
parameter also appears to be useless in this case because I am using an offset versus an integer for the window anyways.
Is the behaviour I am looking for doable with pandas.DataFrame.rolling
or must I look elsewhere/write my own windowing function?
Any guidance would be appreciated. Thanks!