Offsets can be either offset
(strings) or pd.Timedelta
objects. Internally, both of them get converted into an offset using pd.tseries.frequencies.to_offset()
method.
The basic implementation of a rolling window computation with an offset is that for any given index, the offset is subtracted from it creating a slice of the column, i.e. a window, and the function (e.g. max()
below) is called on this window. By default, first point in the window edge is excluded from the computation.
df = pd.DataFrame({'A': [1,4,3,2]},
index=pd.to_datetime(['2020-01-01', '2020-01-02', '2020-01-02', '2020-01-04']))
df['A'].rolling('2D').max()
2020-01-01 1.0
2020-01-02 4.0
2020-01-02 4.0
2020-01-04 2.0
Name: A, dtype: float64
In the example above, the calculations are made in the following windows:
2020-01-01 = max(1) # `min_periods=1` on time-series
[2020-01-01, 2020-01-02] = max(1, 4) # only the first value on 2020-01-02 is considered because the second is not seen yet
[2020-01-01, 2020-01-02] = max(1, 4, 3)
[2020-01-03, 2020-01-04] = max(2) # there is no data on 2020-01-03
All possible offsets are in the pd.offsets
module. Among those, only the ones with fixed frequency are valid offsets.1 They are:
Day
(D
)
Hour
(H
)
Minute
(T
)
Second
(S
)
Milli
(L
)
Micro
(U
)
Nano
(N
)
Also the index can be DatetimeIndex
, TimedeltaIndex
or PeriodIndex
. In fact, the rolling window don't even have to be on the index; it can be on a column of datetime
, timedelta
or period
dtype.
So for example, to use a rolling window of 3 microseconds, use 3U
.
df = pd.DataFrame({
'time': pd.date_range('2020-01-01 12:00:00', '2020-01-01 12:00:01', 10**6),
'value': 1}).head()
# check that each step is indeed 1 microsecond
df['time'].diff().dropna().dt.microseconds.eq(1).all() # True
df.rolling('3U', on='time')['value'].sum()
0 1.0
1 2.0
2 3.0
3 3.0
4 3.0
Name: value, dtype: float64
1 The following code returns these offsets.
offsets = {
name: obj().name for name in dir(pd.offsets)
if hasattr((obj:=getattr(pd.offsets, name)), '_nanos_inc')
}