0

I have data like the sample data below. I'm trying to forecast TiTa using arima with xreg predictors, and I'm looking for a good way to identify lagged predictors. Does anyone know of a good method, package, or function for finding lagged predictors? I'm thinking maybe something like repeatedly lagging the DateTime variable and looking for correlation between TiTa and all the other fields in the data except with all the other fields lagged. For example like subtract 30 minutes from the DateTime look for correlation with Tita, subtract an hour from the DateTime look for Correlation with Tita... I'm wondering if someone's already come up with a better way to do this.

Sample Data:

dput(droplevels(dataset[1:5,]))
structure(list(DateTime = structure(1:5, .Label = c("2013-01-01 00:00:00", 
"2013-01-01 02:00:00", "2013-01-01 03:00:00", "2013-01-01 04:00:00", 
"2013-01-01 05:00:00"), class = "factor"), CustCount = c(3, 
1, 4, 1, 3), TiTa = structure(c(2L, 1L, 3L, 4L, 
2L), .Label = c("11", "2", "3", "39"), class = "factor"), IIP = c(26, 
153, 134.5, 195, 120), ToTa = structure(c(3L, 1L, 2L, 1L, 1L), .Label =       c("", 
"493", "565"), class = "factor"), RtD = structure(c(2L, 
4L, 3L, 1L, 5L), .Label = c("", "16.5", "42.5", "43", "62.5"), class =    "factor"), 
ItD = structure(c(1L, 4L, 2L, 5L, 3L), .Label = c("111", 
"210", "250", "253", "356"), class = "factor"), ToTd = structure(c(1L, 
3L, 2L, 5L, 4L), .Label = c("205", "255", "296", "343", "375"
), class = "factor"), TTR = c(41, 99, 89, 169, 124.5
), Dd = structure(c(3L, 4L, 2L, 1L, 5L), .Label = c("19", 
"22", "29", "43", "93"), class = "factor"), Da = structure(c(3L, 
1L, 2L, 1L, 1L), .Label = c("", "409", "544"), class = "factor")), .Names =     c("DateTime", 
"CustCount", "TiTa", "IIP", 
"TATA", "RtD", "ItD", "TATD", "TTR", 
"Dd", "Da"), na.action = structure(c(2L, 12L, 28L, 31L, 
32L, 53L, 54L, 70L, 72L, 74L, 75L, 76L, 77L, 78L, 88L, 101L 

), class = "omit"), row.names = c(1L, 3L, 4L, 5L, 6L), class = "data.frame")    
modLmakur
  • 531
  • 2
  • 8
  • 24
  • Look into using cross correlation function. You have to format each variable as a time series object, and figure out how you want to handle NAs, then you could do something like: ccf(x,y,5, na.action=na.contiguous). See this: http://www.inside-r.org/r-doc/stats/acf – Wyldsoul Mar 09 '16 at 19:49
  • Thanks for getting back to me on this. I found a really good post here with some functions using ccf that just about do what I'm looking for: http://stackoverflow.com/questions/10369109/finding-lag-at-which-cross-correlation-is-maximum-ccf – modLmakur Mar 10 '16 at 01:28
  • I'm not sure what to do about NA. Some of the fields in my data are pretty sparse. I was thinking about using na.omit but I think that would defeat the purpose of looking for lags. – modLmakur Mar 10 '16 at 01:29
  • This is a good resource for some of the theory and application behind time series analyses and ccf, https://onlinecourses.science.psu.edu/stat510/node/74 – Wyldsoul Mar 10 '16 at 13:38

0 Answers0