0

I am currently trying to develop a new function that calculates rolling statistics by groups within a matrix.

My dataset looks as follows:

     ID year    ROA       CAR
[1,]  1 2009 0.006954926 0.3933436
[2,]  1 2010 0.013286958 0.2892719
[3,]  1 2011 0.012334294 0.2402294
[4,]  1 2012 0.006843720 0.2088247
[5,]  1 2013 0.004888144 0.1757100
[6,]  2 2006 0.010172563 0.0511171

As is noticeable, the data is grouped by ID, which contains yearly observations for ROA and CAR. If you are interested, the data comes from banks, and represents the Return on Assets, and the Capital to Asset Ratio.

My goal is to create a function that estimates a standardised z-score specified as follows:

z = (mean(ROA) + mean(CAR)) / sd(ROA)

However, the score is based on rolling measures of the mean and the standard deviation for a window length of 3, which needs to be computed by an ID basis as the data is indexed by ID and year.

I am trying to specify my code and I have come up with something like this:

z <- rollapply(data, 3, function(x) x(((rollapply(data[,3], 3, mean))
             - (rollapply(data[,4], 3, mean))) / (rollapply(data[,3], 3, 
              sd)))

As is noticeable, I am using the rollapply function to compute the rolling means and the rolling standard deviations, however, I am not sure how to do this for an ID basis. It would be extremely helpful to know how I may do so....

amonk
  • 1,769
  • 2
  • 18
  • 27
  • Without trying to optimize what you did, you can just `split(data,data$ID)`, apply on each element the `z` line and then recombine together. For instance: `l<-split(data,data$ID);l<-lapply(l,yourprocedure);do.call(rbind,l)`. – nicola May 22 '17 at 15:28
  • Hi Nicola, thanks. This looks promising, however I am unfamiliar with these techniques. I tried spltting the data now, and go the following error: $ operator is invalid for atomic vectors I guess this means that I cannot turn my matrix into a list? –  May 22 '17 at 15:37
  • If `data` is a `matrix`, try `data2<-as.data.frame(data)` and apply the above to `data2`. – nicola May 22 '17 at 15:42
  • Many thanks, I will try this shortly. Sorry about the delayed response. –  Jun 22 '17 at 09:44

1 Answers1

0

For a given dataframe df (Monte Carlo: simulate your case)

  df<-data.frame(ID_year=as.numeric(paste0("20",rep(11:20))),ROA=runif(10),CAR=runif(10)*2)

 >df
  ID_year       ROA       CAR
1     2011 0.9999123 0.8441209
2     2012 0.5215255 1.4291197
3     2013 0.7282076 0.3001416
4     2014 0.5001218 1.4413248
5     2015 0.8472549 0.1272178
6     2016 0.1250983 1.4753719
7     2017 0.7133057 0.5369519
8     2018 0.2602237 0.9859258
9     2019 0.4741151 1.0675716
10    2020 0.2010236 0.7205617

I then tried:

incr<-2#time window in years
l1<-lapply(seq(from=1,to=nrow(df),by=incr),function(x1) {
  x2<-x1+incr;
  z<-(mean(df$ROA[x1:x2])+ mean(df$CAR[x1:x2]))/sd(df$ROA[x1:x2])
  return(z) 
})            
> zscore<-unlist(l1)
[1] 6.700638 7.453366 3.319165 5.938582       NA

PS:

I tried for the years:

 2011 2012 2013 
 2013 2014 2015 
 2015 2016 2017 
 2017 2018 2019 
 2019 2020 NA 

EDIT:

 df_split<-split( df , f = df$ID )#split based on `ID`

Then you can apply for every element of the list the above function

amonk
  • 1,769
  • 2
  • 18
  • 27
  • Looks like an alternate way to do something similar, yet the issue is that every ID has a set of yearly observations. For example: ID = 1 has year = 2000, 2001, ... , 2005 and ID = 2 has year = 2003, 2004, ... , 2008 This means that the calculations must be performed for different ID's and years simultaneously... I am sure there is some method to do this, but i am very new to R and hence, I am not sure of which library may be appropriate. –  May 22 '17 at 16:39
  • 1
    Now that I understand R better, I understand that this comment might have solved the issue. Many thanks. –  Jun 22 '17 at 09:42