How can I calculate the yearly stock return of multiple accounts with multiple years?

Question

I am trying to calculate the annual volatility of daily stock returns for my data for which the daily observations run from Jan/1/2005 - Dec/31/2021. My data contains thousands of accounts and therefore I need a code that can calculate it without repeating the code for each account and year. Not all accounts are observed in every year.

What I see on the internet are examples to calculate the annualized volatility (standard deviation) of the daily stock returns of a single account, but I cannot figure out how to do this in one code for thousands of accounts. Can you please help me?

My data looks kind of like this, but then for many years and many accounts:

    ID Year Day Return
    1   1    1     a 
    1   1    2     b
    1   2    1     c
    1   2    2     d
    1   3    1     e
    1   3    2     f
    2   1    1     g
    2   1    2     h
    2   2    1     i
    2   2    2     j
    2   3    1     k
    2   3    2     l

To conclude, I need a column in which I have the annualized standard deviation over daily stock returns, per each company, per year.

This is a short overview of my data:

structure(list(Date = structure(c(14613, 14614, 14615, 14616, 
14617, 14620, 14621, 14622, 14623, 14624, 14627, 14628, 14629, 
14630, 14631, 14634, 14635, 14636, 14637, 14638), class = "Date"), 
    `Price - Close - Daily` = c(17.88, 18.025, 18.27, 17.84, 
    18.485, 17.945, 17.35, 17.365, 17.5, 17.535, 17.67, 18.14, 
    18.38, 18.52, 17.985, 17.375, 17.12, 16.795, 17.085, 16.69
    ), ID = c("NL0000334118", "NL0000334118", "NL0000334118", 
    "NL0000334118", "NL0000334118", "NL0000334118", "NL0000334118", 
    "NL0000334118", "NL0000334118", "NL0000334118", "NL0000334118", 
    "NL0000334118", "NL0000334118", "NL0000334118", "NL0000334118", 
    "NL0000334118", "NL0000334118", "NL0000334118", "NL0000334118", 
    "NL0000334118")), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

The date variable is formatted as date in R and shows correctly by Year-Month-Day, e.g. 2010-01-04.

welcome to SO, please share the result of `dput(head(your_data, 100))` so that we can help — gaut, Jul 20 '22 at 11:38
We need that you share a reproducible example, including some input and (expected) output tables. — Wencheng Lau-Medrano, Jul 20 '22 at 11:46
`Return` is not a character column, if you have to keep your data private, please use `rnorm` or another pseudo-RNG to create a returns column. — Rui Barradas, Jul 20 '22 at 11:49
Sorry for the delay in answering to your questions, I am still figuring out how this works with attaching an example, input, and expected output tables — Amber, Jul 20 '22 at 12:02
Run `dput(head(your_data, 20))` in a R session, copy its output, [edit the question](https://stackoverflow.com/posts/73050923/edit) and paste the command's output. — Rui Barradas, Jul 20 '22 at 12:11

Rui Barradas · Answer 1 · 2022-07-22T19:09:26.747

Here is a simple tidyverse way.

If the name of your data's second column is as complicated as posted, first run

names(prices)[2] <- 'Close'

to simplify the code below.

library(tidyverse)

prices %>%
  mutate(Year = year(Date)) %>%
  group_by(ID, Year) %>%
  summarise(Volatility = 100 * sqrt(n()) * sd(log(lag(Close)) - log(Close), na.rm = TRUE), .groups = "drop")

Test data

This code is inspired in the code by Jonathan Regenstein in this R Views - RStudio post.

library(tidyverse)
library(tidyquant)

symbols <- c("IBM", "MSFT", "GOOG", "FB")

prices <- getSymbols(
  symbols, src = 'yahoo', 
  from = "2013-01-01", 
  auto.assign = TRUE
) %>% 
  map(~ Cl(get(.))) %>%
  reduce(merge) %>% 
  as.data.frame() %>% 
  `names<-`(symbols) %>%
  mutate(Date = ymd(row.names(.))) %>% 
  pivot_longer(-Date, names_to = "ID", values_to = "Close")

Thank you Rui, your summarise code worked. – Amber Jul 21 '22 at 13:04 — Amber, Jul 21 '22 at 13:04

How can I calculate the yearly stock return of multiple accounts with multiple years?

1 Answers1

Test data