4

I'd like to parallelize a period.apply function in R, I'm trying to use doParallel with Foreach, but I don't know how i could implement this function. The data I'm using is an xts object with the date time index and the values of a variable, and what I am trying to do is make the mean of the data every 5 seconds:

                                     VAR
2018-01-01 00:00:00                1945.054
2018-01-01 00:00:02                1944.940
2018-01-01 00:00:05                1945.061
2018-01-01 00:00:07                1945.255
2018-01-01 00:00:10                1945.007
2018-01-01 00:00:12                1944.995

Here is a sample of the code I wrote but it doesn't work:

library(xts)
library(doParallel)
library(foreach)

cores <- detectCores()
cluster <- makeCluster(cores, type = "PSOCK")
registerDoParallel(cluster)

ends <- endpoints(x,"secs",5)
m <- foreach(i = 1:length(index(x))) %dopar% period.apply(x,ends,mean)
index(m) <- foreach(m) %dopar% trunc(index(m),"secs")
stopCluster()

The code that works is this but for a much bigger database it takes too much time:

ends <- endpoints(x,"secs",5)
m <- period.apply(x, ends, mean)
index(m) <- trunc(index(m),"secs")

Is there a way of doing this more efficiently?

Thanks in advance.

Random Cotija
  • 613
  • 1
  • 10
  • 26
  • It's not clear to reader that `period.apply()` is a function in 'xts' - please update with `library(xts)`. – HenrikB Apr 27 '18 at 19:19
  • Regarding "doesn't work", do you get an error message, or it gives you the wrong result, or it just doesn't go any quicker? – Darren Cook Apr 28 '18 at 19:07
  • It takes too much time to execute, and I should stop the execution. – Random Cotija Apr 28 '18 at 19:12
  • @Riverarodrigoa Your approach is to split the task into (roughly) N/5 jobs, each processing just a few rows, where N is the number of data rows you have. More efficient is to set up, say, 8 jobs, each processing N/8 rows. I see Ralf's answer https://stackoverflow.com/a/50090842/841830 is doing exactly this. – Darren Cook Apr 30 '18 at 10:18

2 Answers2

4

Have you tried your code on some simple data set? Because once I got it to run, it did all the work multiple times (once for each line in x). Also, if you try to parallelise work it is usually a good idea to let the 'worker' do as much work as possible before sending the data back. In you code, you have two successive foreach calls which lead to additional communication overhead.

My approach is like this:

  1. Split the xts object into N junks, making sure that we split at one of the 5 second intervals.
  2. Let each worker do all the work for one chunk.
  3. Combine the results. How to choose N?

Since split.xts is used for step one, each chunk will have the same number of 5s intervals. However, the amount of work to be done depends (probably) more on the number of data points than on the number of 5s intervals. So if the distribution of points among these chunks is uneven, it might make sense to use a larger number of chunks together with some load-balancing. If the distribution of points is even, it makes sense to make N as large as possible to minimize the communication overhead. Here I am taking the latter approach, i.e. setting N equal to the number of cores.

Now let's generate some sample data and apply your working solution:

library(xts)
x <- xts(x = runif(100),
         order.by = as.POSIXct("2018-01-01") + 0:99)

ends <- endpoints(x,"secs",5)
m <- period.apply(x, ends, mean)
index(m) <- trunc(index(m),"secs")

Next we set-up the parallel cluster:

library(doParallel)
library(foreach)

cores <- detectCores()
cluster <- makeCluster(cores, type = "PSOCK")
registerDoParallel(cluster)

Now we have to split the xts object. Here I first determine the time span of the entire object and distribute that on N 5s intervals.

N <- cores
k <- as.integer(ceiling(difftime(max(index(x)), min(index(x)), units = "secs") / (5 * N)))

Next I split the xts object into a list of xts objects, each of which having about the same length:

split_x <- split(x, f = "secs", k = 5 * k)

Now I let foreach iterate over these chunks and combine the results:

m2 <- foreach(x = split_x, .packages = c("xts"), .combine = c) %dopar% {
    ends <- endpoints(x,"secs",5)
    m <- period.apply(x, ends, mean)
    index(m) <- trunc(index(m),"secs")
    m
}
stopCluster(cluster)

Hooray, results are equal:

all.equal(m, m2)
#> [1] TRUE
Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
  • Thank you very much! This solves my problem. Now it runs 3 times faster, before without parallelisation 27s, now 8s. Do you know if this time can be reduced more ? – Random Cotija Apr 30 '18 at 11:29
  • @Riverarodrigoa How many (real) cores do you have? How large is the actual data set? Are all cores at 100% during these 8s, or are there (longer) stretches where fewer cores are active? – Ralf Stubner Apr 30 '18 at 12:31
  • Sorry for the delay, the dataset in which I have to work is 120 000 000 observations approximately, now I work in a sample of 1028279 obs (the 8s correspond to this sample). Yes, all the cores of my computer are working 100% in these 8s. – Random Cotija May 02 '18 at 08:32
  • @Riverarodrigoa And how many cores do you have? Anyway, you could give [RcppRoll](https://cran.r-project.org/package=RcppRoll) a try. For simplicity first without parallelism. – Ralf Stubner May 02 '18 at 09:20
  • While RcppRoll is a very nice package, I believe it only does rolling (overlapping) windows. `period.apply()` is for non-overlapping windows. – Joshua Ulrich May 12 '18 at 15:06
  • @JoshuaUlrich With the `by` argument one can get non-overlapping windows (though one currently needs the development version, c.f. https://github.com/kevinushey/RcppRoll/issues/14). However, what is missing is the time-based instead of index-based windows offered by xts. – Ralf Stubner May 12 '18 at 16:37
4

I was really depressed by the performance of period.apply() illustrated in this question. My depression became an obsession to make it faster. So I rewrote it in C. Here's an example that uses it and shows the performance improvement.

library(xts)  # need the GitHub development version
period_apply <- xts:::period_apply  # not exported

set.seed(21)
x <- .xts(rnorm(1e7), 1:1e7)
e <- endpoints(x, "seconds", 5)

system.time(y <- period.apply(x, e, sum))  # current version
#    user  system elapsed 
#  77.904   0.368  78.462 
system.time(z <- period_apply(x, e, sum))  # new C version
#    user  system elapsed 
#  15.468   0.232  15.741
all.equal(y, z)
# [1] TRUE

So it's ~5x faster for this example. There are still a few things that could make it even faster, but 5x was a good place to stop and show it could be better. Check out the latest development version if you want (and are brave enough) to try it.

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418