I have 2 time series and I am using ccf
to find the cross correlation between them.
ccf(ts1, ts2)
lists the cross-correlations for all time lags. How can I find the lag which results in maximum correlation without manually looking at the data?
-
Ok found the answer here http://r.789695.n4.nabble.com/ccf-function-td2288257.html – tan Apr 29 '12 at 02:17
-
3Why don't you put that as an answer and credit the posters from the R help mailing list? – Roman Luštrik Apr 29 '12 at 06:49
-
yes I would have done so, but I do not have enough reputation points to answer my own question. – tan Apr 29 '12 at 16:12
-
Revisit the question when you have. :) – Roman Luštrik Apr 30 '12 at 07:03
-
@tan You can also mark your own answer as the correct one. And, as well as the link, I personally think it is nice to summarize what the answer was, to save Stackoverflowers an extra click. (I've edited your answer to show what I mean; no offence taken if you want to edit it back :-) – Darren Cook Aug 08 '12 at 03:47
4 Answers
Posting the answer http://r.789695.n4.nabble.com/ccf-function-td2288257.html
Find_Max_CCF<- function(a,b)
{
d <- ccf(a, b, plot = FALSE)
cor = d$acf[,,1]
lag = d$lag[,,1]
res = data.frame(cor,lag)
res_max = res[which.max(res$cor),]
return(res_max)
}

- 27,837
- 13
- 117
- 217

- 1,569
- 5
- 14
- 30
I thought I'd redo the above function but have it find the absolute max correlation that returns the original correlation (positive or negative). I also maxed out (nearly) the number of lags.
Find_Abs_Max_CCF<- function(a,b)
{
d <- ccf(a, b, plot = FALSE, lag.max = length(a)-5)
cor = d$acf[,,1]
abscor = abs(d$acf[,,1])
lag = d$lag[,,1]
res = data.frame(cor,lag)
absres = data.frame(abscor,lag)
absres_max = res[which.max(absres$abscor),]
return(absres_max)
}

- 121
- 2
- 6
-
Can you tell why two commas are being used when extracting acf d$acf[,,1] ? Thanks. – Anusha Oct 01 '14 at 19:34
-
1
Because 3 is more than 4, I also had a stab at modifying this function, this time by implementing an idea from here:
ccfmax <- function(a, b, e=0)
{
d <- ccf(a, b, plot = FALSE, lag.max = length(a)/2)
cor = d$acf[,,1]
abscor = abs(d$acf[,,1])
lag = d$lag[,,1]
res = data.frame(cor, lag)
absres = data.frame(abscor, lag)
maxcor = max(absres$abscor)
absres_max = res[which(absres$abscor >= maxcor-maxcor*e &
absres$abscor <= maxcor+maxcor*e),]
return(absres_max)
}
Essentially an "error" term is added, so that if there are several values close to the maximum, they all get returned, eg:
ayy <- jitter(cos((1:360)/5), 100)
bee <- jitter(sin((1:360)/5), 100)
ccfmax(ayy, bee, 0.02)
cor lag
348 0.9778319 -8
349 0.9670333 -7
363 -0.9650827 7
364 -0.9763180 8
If no value for e
is given it is taken to be zero, and the function behaves just like the one nvogen posted.
I've modified the original solution as well, in order to loop over the function and output the values corresponding to a character vector of indices (x):
abs.max.ccf <- function(x,a,b) {
d <- ccf(a, b, plot=FALSE, lag.max=length(a)-5)
cor <- d$acf[,,1]
abscor <- abs(d$acf[,,1])
lag <- d$lag[,,1]
abs.cor.max <- abscor[which.max(abscor)]
abs.cor.max.lag <- lag[which.max(abscor)]
return(c(x, abs.cor.max, abs.cor.max.lag))
}
I removed the data.frame
part within the function, as it is unnecessarily slow. To loop over each column in a data.frame
and return the results to a new data.frame
, I use this method:
max.ccf <- lapply(colnames(df), function(x) unlist(abs.max.ccf(x, df$y, df[x])))
max.ccf <- data.frame(do.call(rbind, max.ccf))
colnames(max.ccf) <- c('Index','Cor','Lag')

- 6,027
- 2
- 46
- 33