1

I have a dataset like so:

 set.seed(242)
 df<- data.frame(month=order(seq(1,20,1),decreasing=TRUE), 
 psit=sample(1:100,20,replace=TRUE),  var=sample(1:10,20, 
 replace=TRUE))

I wish to do a crude time lag analysis to see how lagged var data affects psit data. A lag, as defined in this crude analysis, is var data T-1, T-2, T-3, etc. months in the past from each psit data point.

To see how the prior months var data affects psit data, I wish to make a timelag vector which consists of var data that is one month offset from the psit variable. Then I'll cbind the timelag vector to the psit vector. Here is are examples of the dataframes for a 1 month offset,2 month offset,3 month offset, respectivley:

 set.seed(242)
 timelag1<- cbind(df[1:12,2], df[2:13,3]) #1 month time lag
 timelag2<- cbind(df[1:12,2], df[3:14,3]) #2 month time lag
 timelag3<- cbind(df[1:12,2], df[4:15,3]) #3 month time lag

For each dataframe, I want to regress var against psit data using the lm() function and output the R-squared value. This process would be repeated for each subsequent offset. Example below:

 model1<-lm(timelag1)
 summary(model1)$r.squared
 model2<-lm(timelag2)
 summary(model2)$r.squared
 model3<-lm(timelag3)
 summary(model3)$r.squared

I would like to create a loop that iterates this process for a large dataset of 240 months. Then runs an lm() on each dataframe and then output the r-squared value.

Danielle
  • 785
  • 7
  • 15

1 Answers1

0

Try the dyn package which allows lm to process zoo and other time series objects:

library(dyn)

z <- read.zoo(df)
models <- lapply(1:3, function(i) dyn$lm(psit ~ lag(var, -i), tail(z, 12+i)))
sapply(models, function(x) summary(x)$r.squared)
## [1] 0.31209189 0.04923393 0.09995727

Note that typically if one uses lag k then one also includes all smaller values of k as well. In that case:

models <- lapply(1:3, function(i) dyn$lm(psit ~ lag(var, -(1:i)), tail(z, 12+i)))
do.call("anova", models)

giving:

Model 1: psit ~ lag(var, -(1:i))
Model 2: psit ~ lag(var, -(1:i))
Model 3: psit ~ lag(var, -(1:i))
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     10 8688.5                           
2      9 8221.7  1    466.73 0.4545 0.5192
3      8 8215.5  1      6.24 0.0061 0.9398
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thank you for your help @G. Grothendieck. I am getting an error for both options you present. `Error in lag(var, -(1:i)) : n must be a single positive integer ` . Are you able to annotate the code so I understand what the arguments do to help attain the outcome? – Danielle Sep 18 '17 at 01:35
  • I should note, when I read in df using `read.zoo()`, the `month` column was removed and df only consisted of `psit` and `var` variables. Should this have happened? – Danielle Sep 18 '17 at 01:38
  • Specifically, what does `tail(z, 12+i)` do in the `dyn` argument and what does `1:3` do in the `lapply` command? In my example I had 3 models to fun the regression, but if my dataset can iterate to 12 months lag or more, is the 1:3 limiting the analysis to three lags and thus, three models? – Danielle Sep 18 '17 at 01:52
  • You will need to read up on zoo. There are several vignettes. `tail(z, 12+i)` takes the 12+i most recent points. The `lapply` runs the anonymous function for each value of 1:3 so it is invoked three times, once for i=1, once for i=2 and then once for i=3. If you make sure that dyn and its dependencies are installed, start a fresh R session, copy and paste `df` from the question and then copy and paste the code in the answer I see no errors. – G. Grothendieck Sep 18 '17 at 02:12
  • Are you able to assist me with this updated version of my question? https://stackoverflow.com/questions/46395927/time-lag-analysis-on-list-of-imputed-datasets – Danielle Sep 25 '17 at 04:51
  • What if `var` in the example above is a number not an integer? I am getting an error with my real data: `Error in lag(var, -i) : n must be a single positive integer` , where variances `var` can be decimals – Danielle Sep 25 '17 at 18:55
  • Suspect you may have some other package loadeed that has redefined `lag`. Try again in a fresh R session. – G. Grothendieck Sep 25 '17 at 19:40