I have a time series dataset with 2 columns : x is "hourly" continuous temperature data and y is periodically sampled response data (periodic samples taken at 5am , 2pm, 8pm every day) over a couple weeks.
I would like to do 2 lag approaches to analyse the data
1) plot all my y data (constant) vs increasingly lagged x data (shift x data by 0-24 hours in 1 hour steps) i.e x at 6pm vs y at 6pm; x at 5pm vs y at 6pm ...... x(5pm on previous day) vs y (6pm)
2) The same as 1) but cumulative shifts i.e. "backward in time" cumulative lag window of 0:24 with a step of 1 for the x data and test it against the y data i.e x at 6pm vs y at 6pm; x at (avg 5pm & 6pm) vs y at 6pm ...... x(Average of 6pm - 5pm on previous day) vs y (6pm)
I want to plot a linear model (lm) of "y" vs "shifted x" for each lag scenario (0 - 24) and make a table with a column for number of lags, p-value of lm; and Adj. R2 of lm) so I can see which lag and cumulative average lag in "x" best explains the y-data.
Essentially it is the same as the "cummean" or the "rollapply" functions, but working in a backward direction, but I could not find anything in R that does this. Flipping the X data does not work as the order of the data needs to be maintained as i need the lag for in x for several y's
I would guess it would require a 'for' loop to run through all the data at each lag with "i" being the lag
A Single run with 0 lag will be like this :
#Creating dummy data
x<- zoo(c(10,10.5,10.5,11,11.5,12,12.5,12,12,12.5,13,12.5,12,12,11.5,10.5), as.Date(1:16))
y<- zoo(c(rep("NA",3),40,rep("NA",3),45,rep("NA",3),50,rep("NA",3),40), as.Date(1:16))
z<-merge(x, y, all = FALSE)
z
reslt<-lm(z$y~z$x)
a<-summary(reslt)$coefficients[2,4]
b<-summary(reslt)$adj.r.squared
ResltTable<-c(a,b)
colnames(ResltTable)<-c("p-value","Adj. R^2")
Thanks !