Struggling to understand the logic behind rolling window functions that use 'D' as part of the input

Question

As an aspiring data scientist, I am currently learning to work with time series and just finished learning window functions. It is clear to me that rolling window functions help compute a moving metric, such as average or sum, of time series data. However, I am struggling to understand the computational logic behind rolling window functions that use 'D' as part of the input. Below is the example:

I have the following dataset:

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/Arsik36/StO/master/yahoo.csv',
                 parse_dates = ['date'], index_col = 'date')

df.head()

From the output in your environment, you will see dataset contains date column as the index and then corresponding values. The logic is perfectly clear to me when I set window = 5, such as below:

 df['window_5'] = df.rolling(window = 5).mean()
 df

The new column creates several NaN rows at first, and then computes the mean of the last 5 dates, crystal clear. However, when I specify the window argument to be '5D' - 5 calendar days - the new column does not produce NaN values at the beginning.

df['window_5D'] = df['price'].rolling(window = '5D').mean()
df

Through my own analysis, I realize that the value in the first row of 'window_5D' column is the mean of first column in 'price', the value in the second row of 'window_5D' column is the mean of first 2 rows of 'price' column, and so on. What I don't understand is why are computations done this way, if I specify the window of size '5D'?

The dataset I included includes Yahoo stock prices. On weekends, price remains the same. So, in my mind, '5D' should create the same first several NaN values as if I specify window = 5, but unlike window = 5, window = 5D would also assume that on weekends price stayed the same as on Friday, and would take that into account when computing mean.

window = '5D' concept is what I am confused about, and I thank you in advance in helping me understand the logic behind this computation given my confusions with the scenario above.

score 1 · Answer 1 · answered Aug 22 '20 at 18:12

1

It is because window = '5D' is an offset, the window argument produces different results depending whether its value is an 'int' or an offset.

check out the documentation

Also here for more clarity

answered Aug 22 '20 at 18:12

izbid

108
9

Thank you for this info! I am still confused, even after reading documentation and the link you included, so will study more on this topic. – Arsik36 Aug 23 '20 at 00:06

Struggling to understand the logic behind rolling window functions that use 'D' as part of the input

1 Answers1