DataFrame: Rolling Year-to-Date cumulative STD (Time Series)

Question

I have a DataFrame that looks something like this:

            daily_return      year month day
date                                        
2018-12-27           NaN      2018    12  27
2018-12-28      1.020245      2018    12  28
2018-12-31      1.000650      2018    12  31
2019-01-02      1.020473      2019    01  02
2019-01-03      1.009129      2019    01  03
...                  ...       ...   ...  ..
2023-01-20      1.001087      2023    01  20
2023-01-23      0.998455      2023    01  23
2023-01-24      1.002491      2023    01  24
2023-01-25      1.004482      2023    01  25
2023-01-26      1.004953      2023    01  26

I want to calculate, for each date (each row), the year-to-Date Standard Deviation of the daily_return column. Examples:

For the row '2018-12-31', the result would be the standard deviation of daily_return that goes from '2018-12-01' up until '2018-12-31': [NaN, 1.020245, 1.000650].
For the row '2019-01-03', we are referring to the year '2019'. Therefore, the standard deviation should only consider values from '2019-01-01' up until the current row's date '2019-01-03': [1.020473, 1.009129]
For the row [2023-01-23], the result should be all daily_returns from 2023-01-01 up until 2023-01-23
For the row [2023-01-24], the result should be all daily_returns from 2023-01-01 up until 2023-01-24 ...

I managed to make something similar work for Month-to-Date cumprod by using:

df.groupby(['month', 'year'])['daily_return'].cumprod()

date
2018-12-27         NaN
2018-12-28    1.020245
2018-12-31    1.020909
2019-01-02    1.020473
2019-01-03    1.029789
                ...   
2023-01-20    0.999914
2023-01-23    0.998369
2023-01-24    1.000856
2023-01-25    1.005342
2023-01-26    1.010322

You can see it is working properly since, for every date, it gives me the Month-to-Date cumprod.

However, whenever I try grouping by year and calculating the std, it will only calculate the std for the entire year, and not for the Year-to-Date for each row:

df.groupby(['year'])['daily_return'].std()

year
2018    0.013856
2019    0.007128
2020    0.021325
2021    0.010358
2022    0.009854
2023    0.006444

Is the standard deviation of YTD as same as [Annualized Standard Deviation](https://financetrain.com/calculate-annualized-standard-deviation) ? I also found related [post1](https://stackoverflow.com/a/72252135/10452700) & [post2](https://stackoverflow.com/a/71352450/10452700). — Mario, Jan 29 '23 at 20:55
@Mario sort of, but not quite. post1 uses a rolling window of a yearly period, which is not quite what I was looking for. I was looking for periods starting on the first day of the year, and not -252 days back. — Pedro Rossi, Jan 30 '23 at 01:01

score 1 · Accepted Answer · answered Jan 29 '23 at 19:32

If you first sort the index, so that the rows are in the correct order, you can then then group by year, then using the expanding window and apply the .std() function:

# may need to set index as datetime if it isn't already
df.index = pd.to_datetime(df.index, format="%Y-%m-%d")

# sort the index
df.sort_index(inplace=True)
# calculate YTD standard deviation
df["YTD std"] = df.groupby(df.index.year)['daily_return'].expanding().std().droplevel(0)

df
#Out: 
#            daily_return  year  month  day   YTD std
#date                                                
#2018-12-27           NaN  2018     12   27       NaN
#2018-12-28      1.020245  2018     12   28       NaN
#2018-12-31      1.000650  2018     12   31  0.013856
#2019-01-02      1.020473  2019      1    2       NaN
#2019-01-03      1.009129  2019      1    3  0.008021
#2023-01-20      1.001087  2023      1   20       NaN
#2023-01-23      0.998455  2023      1   23  0.001861
#2023-01-24      1.002491  2023      1   24  0.002049
#2023-01-25      1.004482  2023      1   25  0.002533
#2023-01-26      1.004953  2023      1   26  0.002650

Yes! This ended up working as expected! By the way, it doesn't seem to be necessary to sort the index. Thanks! — Pedro Rossi, Jan 30 '23 at 00:58
No problem! It may not seem necessary if the index is already sorted, but if the dates are in a different order (say, for example, that 2023-01-01 is after 2023-03-01) then the grouping doesn't sort these and the YTD standard deviation for 2023-01-01 will actually include the data from 2023-03-01 because it takes all values from the same year that appear in rows before, and including, it. Of course, if the data is retrieved in the correct order in the first place, no need to worry. — Rawson, Jan 30 '23 at 20:44

DataFrame: Rolling Year-to-Date cumulative STD (Time Series)

1 Answers1