I have a DataFrame that looks something like this:
daily_return year month day
date
2018-12-27 NaN 2018 12 27
2018-12-28 1.020245 2018 12 28
2018-12-31 1.000650 2018 12 31
2019-01-02 1.020473 2019 01 02
2019-01-03 1.009129 2019 01 03
... ... ... ... ..
2023-01-20 1.001087 2023 01 20
2023-01-23 0.998455 2023 01 23
2023-01-24 1.002491 2023 01 24
2023-01-25 1.004482 2023 01 25
2023-01-26 1.004953 2023 01 26
I want to calculate, for each date (each row), the year-to-Date Standard Deviation of the daily_return
column.
Examples:
- For the row '2018-12-31', the result would be the standard deviation of
daily_return
that goes from '2018-12-01' up until '2018-12-31': [NaN, 1.020245, 1.000650]. - For the row '2019-01-03', we are referring to the year '2019'. Therefore, the standard deviation should only consider values from '2019-01-01' up until the current row's date '2019-01-03': [1.020473, 1.009129]
- For the row [2023-01-23], the result should be all daily_returns from 2023-01-01 up until 2023-01-23
- For the row [2023-01-24], the result should be all daily_returns from 2023-01-01 up until 2023-01-24 ...
I managed to make something similar work for Month-to-Date cumprod
by using:
df.groupby(['month', 'year'])['daily_return'].cumprod()
date
2018-12-27 NaN
2018-12-28 1.020245
2018-12-31 1.020909
2019-01-02 1.020473
2019-01-03 1.029789
...
2023-01-20 0.999914
2023-01-23 0.998369
2023-01-24 1.000856
2023-01-25 1.005342
2023-01-26 1.010322
You can see it is working properly since, for every date, it gives me the Month-to-Date cumprod.
However, whenever I try grouping by year and calculating the std
, it will only calculate the std
for the entire year, and not for the Year-to-Date for each row:
df.groupby(['year'])['daily_return'].std()
year
2018 0.013856
2019 0.007128
2020 0.021325
2021 0.010358
2022 0.009854
2023 0.006444