I want to calculate fiscal year returns and standard deviations from daily returns for a large number of firms. I am relatively new to R, having previously used SAS to calculate returns etc. However, I'd like to switch to R in the short/medium-term.
I have two files: 1) Containing a firm identifier, dates, daily returns(df.1) and 2) my sample (df.2) over which I'd like to aggregate the returns
firm date ret
1 01/01/1992 0.024
1 02/01/1992 0.010
. . .
. . .
1 31/12/2014 0.002
2 01/01/1992 0.004
2 02/01/1992 0.012
The file is very large about 1M rows.
The second file looks like that:
firm fiscal_year_start fiscal_year_end
1 01/01/1992 31/12/1992
1 01/01/1993 31/12/1993
1 01/01/1994 31/12/1994
I want to calculate fiscal year returns and annualised standard deviation. Both .csv files are loaded into R as data frames. I am unsure on how to best treat the date variables and how to structure the for loop to loop through the daily return file.
Any help would be much appreciated.
EDIT1
I am able to subset the big data frame using this function:
myfunc <- function(x,y,z){df.1(df.1$date1 >= x & df.1$date1 < y & df.1$firm == firm1,]}
firm1 <- df.2$firm[1]
start_date <- df.2$StartDate[1]
end_date <- df.2$EndDate[1]
Test <- myfunc(start_date,end_date, firm1)
For this subset I can then get the fiscal-year return and std:
# return
fiscal_year_ret <- with(Test, sum(Test$ret))
# annualized variance
var <- with(Test, var(Test$ret))
annualized_var <- var*length(Test)
annualized_st.dev <- sqrt(annualized_var)
My big problem is embedding this into a loop that allows me to loop through the different firm identifiers and dates in df.2
EDIT2
So I have something like this
df.output <- data.frame(returns=as.numeric(),
std.deviation=as.numeric(),
stringsAsFactors=FALSE)
I would like to populate the above data frame with the results.
for (i in sample) {
myfunc <- function(x,y,z){df.1[df.1$date1 >= x & df.1$date1 < y & df.1$firm == firm1,]}
firm1 <- df.2$firm[i]
start_date <- df.2$StartDate[i]
end_date <- df.2$EndDate[i]
subset <- myfunc(start_date,end_date, firm1)
# return
fiscal_year_ret <- with(subset, sum(subset$ret))
df.output$returns <-fiscal_year_ret
# variance
var <- with(subset, var(subset$ret))
annualized_var <- var*length(subset)
annualized_st.dev <- sqrt(annualized_var)
}
Something like that.