Lagged Covariance functions in R are extremely slow

Question

I am trying to estimate the lagged covariance between two vector. I have used gsignal::xcov, forecast::Ccf and stats::ccf. My code has several nested loops so computing time is pilling up, with gsignal::xcov being the fastest but still extremely slow. The very same code runs on MATLAB (using xcov) 10+ times faster. I also tried lapply, but it was even slower. I know I could use parallel computing to speed it up, but for irrelevant reasons I cannot do that.

Thanks!

Edit: I included an example of my code in R and Matlab. R took 47.8 seconds, Matlab only 1.4 seconds.

R Code

rm(list=ls())
starting_time <- Sys.time()
library(utils)
library(IRISSeismic)
library(pracma)
library(gsignal)
set.seed(1)
a <- rnorm(10000)
b <- rnorm(10000)
c <- rnorm(10000)
d <- rnorm(10000)
e <- rnorm(10000)
f <- rnorm(10000)
data <- cbind(a,b,c,d,e,f)
channels_number <- dim(data)[2]
combinations <- combn(1:channels_number,2)
scales <- seq(15,100,5)
for (scale in scales){
    finish <- 0
    for (window in 1:floor(dim(data)[1]/scale)){
        print(paste0("Starting scale: ",scale, " window: ", window))
        start <- finish + 1
        finish <- start + scale-1
        tmp <- data[start:finish,]
        detrended <- zeros(dim(tmp)[1],dim(tmp)[2])
        for (channel in 1:channels_number){
            fit <- polyfit(1:dim(tmp)[1],tmp[,channel],2)
            fit <- polyval(fit,1:dim(tmp)[1])
            detrended[,channel] <- tmp[,channel] - fit
        }
        tmp <- detrended
        ##### Resource Heavy Part #####
        for (combo in 1:dim(combinations)[2]){
            ch1 <- combinations[1,combo]
            ch2 <- combinations[2,combo]
            signal_1 <- tmp[,ch1]
            signal_2 <- tmp[,ch2]
            xcov_results <- xcov(signal_1,signal_2,scale = 'biased')
        }
    }
}
ending_time <- Sys.time()
ending_time - starting_time

Matlab Code

clear all
clc
tic
a = rand(10000,1);
b = rand(10000,1);
c = rand(10000,1);
d = rand(10000,1);
e = rand(10000,1);
f = rand(10000,1);
data = [a,b,c,d,e,f];
channels_number = size(data,2);
combinations = nchoosek(1:channels_number,2);
scales = 15:5:100;
for s = 1:size(scales,2)
    scale = scales(s);
    finish = 0;
    for window = 1:floor(size(data,1)/scale)
        disp("Starting scale: " + scale + " window: " + window)
        start = finish + 1;
        finish = start + scale-1;
        tmp = data(start:finish,:);
        tmp = detrend(tmp,2);
        %%%%% Resource Heavy Part %%%%%
        for combo = 1:size(combinations,2)
            ch1 = combinations(1,combo);
            ch2 = combinations(2,combo);
            signal_1 = tmp(:,ch1);
            signal_2 = tmp(:,ch2);
            xcov_results = xcov(signal_1,signal_2,'biased');
        end
    end
end
toc

This looks like an interesting library, but unfortunately it won't work in my case. I am interested in lagged covariance. — Orestis, May 29 '23 at 12:28
Additional information would be helpful here. What sizes are the vectors? What lags are being used? If the nested loops are not too complicated, consider sharing how you are processing the lagged covariance in context. A representative example would go a long way. — jblood94, May 30 '23 at 12:20
@jblood94 I included an example of the codes. The lag is the whole length of the vector, but in different occasions the vectors have different lengths. — Orestis, Jun 07 '23 at 10:04

jblood94 · Answer 1 · 2023-06-13T11:07:34.887

A couple things to speed this up:

First, the detrending can be done in one shot (like in MATLAB) by passing matrices to qr.solve and performing a matrix multiplication on the result. That will remove the first inner for loop.

Second, the sum of the residuals after detrending will be 0, so the lagged covariance will be the same as the lagged correlation of the pairwise columns. The second inner for loop can be replaced with a function optimized to do pairwise lagged correlation (with the desired scaling):

xcorrPaired <- function(x) {
  d <- dim(x)
  dd <- d[2] - 1L
  N <- 2L*d[1] - 1L
  M <- 2^ceiling(log2(N))
  xx <- matrix(0, M, d[2])
  xx[d[1]:N,] <- x
  pre <- mvfft(xx)
  post <- Conj(mvfft(rbind(x, matrix(0, M - d[1], d[2]))))
  Re(mvfft(pre[,rep.int(1:dd, dd:1)]*post[,sequence(dd:1, 2:d[2])], 1))[1:N,]/d[1]/M
}

The simplified code is as follows (with timing):

set.seed(1)
data <- matrix(rnorm(6e4), 1e4, 6)
scales <- seq(15L, 100L, 5L)

system.time({
  n <- nrow(data)
  x <- outer(1:n, 2:0, "^")
  xcov_results2 <- vector("list", length(scales))
  i <- 0L

  for (scale in scales) {
    for (start in seq(1L, by = scale, length.out = n%/%scale)) {
      idx <- start:(start + scale - 1L)
      xcov_results2[[i <- i + 1L]] <- xcorrPaired(data[idx,] - x[idx,]%*%qr.solve(x[idx,], data[idx,]))
    }
  }
})
#>    user  system elapsed 
#>    0.58    0.17    0.75

The timing is now comparable to the MATLAB version (I get about 0.3 seconds in MATLAB after removing the disp line), and no external packages are needed.

Compare to the original version (slightly modified for comparison):

library(pracma)
library(gsignal)

set.seed(1)
a <- rnorm(10000)
b <- rnorm(10000)
c <- rnorm(10000)
d <- rnorm(10000)
e <- rnorm(10000)
f <- rnorm(10000)
data <- cbind(a,b,c,d,e,f)
scales <- seq(15,100,5)

system.time({
  channels_number <- dim(data)[2]
  combinations <- combn(1:channels_number,2)
  xcov_results <- vector("list", length(scales))
  i <- 0L
  
  for (scale in scales){
    finish <- 0
    for (window in 1:floor(dim(data)[1]/scale)){
      xcov_results[[i <- i + 1L]] <- matrix(0, 2*scale - 1, ncol(combinations))
      start <- finish + 1
      finish <- start + scale-1
      tmp <- data[start:finish,]
      detrended <- zeros(dim(tmp)[1],dim(tmp)[2])
      for (channel in 1:channels_number){
        fit <- polyfit(1:dim(tmp)[1],tmp[,channel],2)
        fit <- polyval(fit,1:dim(tmp)[1])
        detrended[,channel] <- tmp[,channel] - fit
      }
      tmp <- detrended
      ##### Resource Heavy Part #####
      for (combo in 1:dim(combinations)[2]){
        ch1 <- combinations[1,combo]
        ch2 <- combinations[2,combo]
        signal_1 <- tmp[,ch1]
        signal_2 <- tmp[,ch2]
        xcov_results[[i]][,combo] <- xcov(signal_1,signal_2,scale = 'biased')$C
      }
    }
  }
})
#>    user  system elapsed 
#>   22.74    0.19   23.00

Check that the results are the same.

all.equal(xcov_results, xcov_results2)
#> [1] TRUE

Hi. This is indeed very fast! Thanks for all the effort. But for some reason I don't get the exact same values. I also tried to run step by step your code but I don't really understand what some of the lines are doing. — Orestis, Jun 12 '23 at 09:25
`data[idx,] - x[idx,]%*%qr.solve(x[idx,], data[idx,])` does the detrending across all channels, and `xcorrPaired` does the job of `xcov` for all combinations of channels. The internals of `xcorrPaired` might be recognizable if you were to step through `gsignal::xcorr`, (via e.g., `debugOnce(gsignal::xcorr)`), which is called by `gsignal::xcov`. — jblood94, Jun 12 '23 at 11:24
I put all your code together and run it. But one output is matrix and the other is a list. https://pastebin.com/WyJ9PChH — Orestis, Jun 13 '23 at 09:36
Sorry. I had forgotten to update the block of code generating `xcov_results2`. It should be correct now. — jblood94, Jun 13 '23 at 11:08

Lagged Covariance functions in R are extremely slow

1 Answers1