0

I want to calculate fiscal year returns and standard deviations from daily returns for a large number of firms. I am relatively new to R, having previously used SAS to calculate returns etc. However, I'd like to switch to R in the short/medium-term.

I have two files: 1) Containing a firm identifier, dates, daily returns(df.1) and 2) my sample (df.2) over which I'd like to aggregate the returns

firm      date        ret
   1   01/01/1992     0.024
   1   02/01/1992     0.010
   .        .           .
   .        .           .
   1   31/12/2014     0.002 
   2   01/01/1992     0.004
   2   02/01/1992     0.012

The file is very large about 1M rows.

The second file looks like that:

firm  fiscal_year_start fiscal_year_end 
    1 01/01/1992        31/12/1992   
    1 01/01/1993        31/12/1993
    1 01/01/1994        31/12/1994

I want to calculate fiscal year returns and annualised standard deviation. Both .csv files are loaded into R as data frames. I am unsure on how to best treat the date variables and how to structure the for loop to loop through the daily return file.

Any help would be much appreciated.

EDIT1

I am able to subset the big data frame using this function:

myfunc <- function(x,y,z){df.1(df.1$date1 >= x & df.1$date1 < y & df.1$firm == firm1,]}          
firm1  <- df.2$firm[1]
start_date <- df.2$StartDate[1]
end_date  <- df.2$EndDate[1]
Test <- myfunc(start_date,end_date, firm1)

For this subset I can then get the fiscal-year return and std:

# return
fiscal_year_ret  <- with(Test, sum(Test$ret))

# annualized variance
var <- with(Test, var(Test$ret))
annualized_var  <- var*length(Test)
annualized_st.dev <- sqrt(annualized_var)

My big problem is embedding this into a loop that allows me to loop through the different firm identifiers and dates in df.2

EDIT2

So I have something like this

df.output <- data.frame(returns=as.numeric(),
             std.deviation=as.numeric(), 
             stringsAsFactors=FALSE) 

I would like to populate the above data frame with the results.

for (i in sample) {
myfunc <- function(x,y,z){df.1[df.1$date1 >= x & df.1$date1 < y & df.1$firm == firm1,]}        
firm1  <- df.2$firm[i]
start_date <- df.2$StartDate[i]
end_date  <- df.2$EndDate[i]
subset <- myfunc(start_date,end_date, firm1) 

# return
fiscal_year_ret    <- with(subset, sum(subset$ret))
df.output$returns  <-fiscal_year_ret 
# variance
var <- with(subset, var(subset$ret))
annualized_var  <- var*length(subset)
annualized_st.dev <- sqrt(annualized_var)

}

Something like that.

Fabian
  • 1
  • 1

1 Answers1

0

Here is one way:

library(lubridate)

data %>%
  mutate(year =
           date %>%
           mdy %>%
           floor_date(unit = "year") )
  group_by(year) %>%
  summarize(
    mean_return = mean(ret),
    sd_return = sd(ret))
bramtayl
  • 4,004
  • 2
  • 11
  • 18
  • Well, it should be something like `cumprod(1+ret)` for each year, but as OP says, some firms start their financial years on different dates. See the package `PerformanceAnalytics`. But, I don't do finance stuff for free :) – jeremycg Dec 08 '15 at 00:07
  • Thanks for the reply. I need more help with the structure of the for loop to access the information and start and end dates from one data frame and use that information to loop through the big file. The calculations itself are trivial. – Fabian Dec 08 '15 at 08:52