16

I have 2 time series and I am using ccf to find the cross correlation between them. ccf(ts1, ts2) lists the cross-correlations for all time lags. How can I find the lag which results in maximum correlation without manually looking at the data?

flodel
  • 87,577
  • 21
  • 185
  • 223
tan
  • 1,569
  • 5
  • 14
  • 30
  • Ok found the answer here http://r.789695.n4.nabble.com/ccf-function-td2288257.html – tan Apr 29 '12 at 02:17
  • 3
    Why don't you put that as an answer and credit the posters from the R help mailing list? – Roman Luštrik Apr 29 '12 at 06:49
  • yes I would have done so, but I do not have enough reputation points to answer my own question. – tan Apr 29 '12 at 16:12
  • Revisit the question when you have. :) – Roman Luštrik Apr 30 '12 at 07:03
  • @tan You can also mark your own answer as the correct one. And, as well as the link, I personally think it is nice to summarize what the answer was, to save Stackoverflowers an extra click. (I've edited your answer to show what I mean; no offence taken if you want to edit it back :-) – Darren Cook Aug 08 '12 at 03:47

4 Answers4

22

Posting the answer http://r.789695.n4.nabble.com/ccf-function-td2288257.html

Find_Max_CCF<- function(a,b)
{
 d <- ccf(a, b, plot = FALSE)
 cor = d$acf[,,1]
 lag = d$lag[,,1]
 res = data.frame(cor,lag)
 res_max = res[which.max(res$cor),]
 return(res_max)
} 
Darren Cook
  • 27,837
  • 13
  • 117
  • 217
tan
  • 1,569
  • 5
  • 14
  • 30
12

I thought I'd redo the above function but have it find the absolute max correlation that returns the original correlation (positive or negative). I also maxed out (nearly) the number of lags.

Find_Abs_Max_CCF<- function(a,b)
{
 d <- ccf(a, b, plot = FALSE, lag.max = length(a)-5)
 cor = d$acf[,,1]
 abscor = abs(d$acf[,,1])
 lag = d$lag[,,1]
 res = data.frame(cor,lag)
 absres = data.frame(abscor,lag)
 absres_max = res[which.max(absres$abscor),]
 return(absres_max)
}
nvogen
  • 121
  • 2
  • 6
3

Because 3 is more than 4, I also had a stab at modifying this function, this time by implementing an idea from here:

ccfmax <- function(a, b, e=0)
{
 d <- ccf(a, b, plot = FALSE, lag.max = length(a)/2)
 cor = d$acf[,,1]
 abscor = abs(d$acf[,,1])
 lag = d$lag[,,1]
 res = data.frame(cor, lag)
 absres = data.frame(abscor, lag)
 maxcor = max(absres$abscor)
 absres_max = res[which(absres$abscor >= maxcor-maxcor*e &
                        absres$abscor <= maxcor+maxcor*e),]
 return(absres_max)
}

Essentially an "error" term is added, so that if there are several values close to the maximum, they all get returned, eg:

ayy <- jitter(cos((1:360)/5), 100)
bee <- jitter(sin((1:360)/5), 100)

ccfmax(ayy, bee, 0.02)
           cor lag
348  0.9778319  -8
349  0.9670333  -7
363 -0.9650827   7
364 -0.9763180   8

If no value for e is given it is taken to be zero, and the function behaves just like the one nvogen posted.

Community
  • 1
  • 1
AkselA
  • 8,153
  • 2
  • 21
  • 34
1

I've modified the original solution as well, in order to loop over the function and output the values corresponding to a character vector of indices (x):

abs.max.ccf <- function(x,a,b) {
  d <- ccf(a, b, plot=FALSE, lag.max=length(a)-5)
  cor <- d$acf[,,1]
  abscor <- abs(d$acf[,,1])
  lag <- d$lag[,,1]
  abs.cor.max <- abscor[which.max(abscor)]
  abs.cor.max.lag <- lag[which.max(abscor)]
  return(c(x, abs.cor.max, abs.cor.max.lag))
}

I removed the data.frame part within the function, as it is unnecessarily slow. To loop over each column in a data.frame and return the results to a new data.frame, I use this method:

max.ccf <- lapply(colnames(df), function(x) unlist(abs.max.ccf(x, df$y, df[x])))
max.ccf <- data.frame(do.call(rbind, max.ccf))
colnames(max.ccf) <- c('Index','Cor','Lag')
Adam Erickson
  • 6,027
  • 2
  • 46
  • 33