Optimize performance of a formula spanning three consecutive indices, with wraparound

Question

I want to optimize the implementation of this formula.

Here is the formula:

x is an array of values. i goes from 1 to N where N > 2400000. For i=0, i-1 is the last element and for i=lastElement, i+1 is the first element. Here is the code which I have written:

   x <- 1:2400000
   re <- array(data=NA, dim = NROW(x))
   lastIndex = NROW(x)
   for(i in 1:lastIndex){
      if (i==1) {
        re[i] = x[i]*x[i] - x[lastIndex]*x[i+1]
      } else if(i==lastIndex) {
        re[i] = x[i]*x[i] - x[i-1]*x[1]
      } else {
        re[i] = x[i]*x[i] - x[i-1]*x[i+1]  
      }
    }

Can it be done by apply in R?

"Optimize it" for what? compactness (see my answer) performance? (use `data.table`) — smci, Apr 26 '17 at 00:43
Chetan, then you have to edit the title to say that. 'Optimize' is inherently ambiguous. — smci, Apr 27 '17 at 13:56

score 4 · Accepted Answer · answered Apr 27 '17 at 14:43

4

We can use direct vectorization for this

# Make fake data
x <- 1:10
n <- length(x)
# create vectors for the plus/minus indices
xminus1 <- c(x[n], x[-n])
xplus1 <- c(x[-1], x[1])

# Use direct vectorization to get re
re <- x^2 - xminus1*xplus1

answered Apr 27 '17 at 14:43

Dason

60,663
9
131
148

Awesome! Thanks Dason :) – Globox Apr 27 '17 at 18:14
This is creating three copies of a very large vector/array. You could avoid the copies with the padding trick, then `x[2:N]^2 - x[1:N-1]*x[3:N+1]` – smci Apr 29 '17 at 00:00
@Chetan: this takes 3x the memory. If x is very large, then when you run out of memory, it will reduce the performance. – smci May 01 '17 at 22:32

jogo · Answer 2 · 2017-04-28T12:05:29.150

If really each x[i] is equal to i then you can do a little math:
xi^2 - (xi-1)*(xi+1) = 1
so all elements of the result are 1 (only the first and the last are not 1).
The result is:

c(1-2*N, rep(1, N-2), N*N-(N-1))

In the general case (arbitrary values in x) you can do (as in the answer from Dason):

x*x - c(x[N], x[-N])*c(x[-1], x[1])

Here is a solution with rollapply() from zoo:

library("zoo")
rollapply(c(x[length(x)],x, x[1]), width=3, function(x) x[2]^2 - x[1]*x[3]) # or:
rollapply(c(tail(x,1), x, x[1]), width=3, function(x) x[2]^2 - x[1]*x[3])

Here is the benchmark:

library("microbenchmark")
library("zoo")

N <- 10000
x <- 1:N

microbenchmark(
  math=c(1-2*N, rep(1, N-2), N*N-(N-1)), # for the data from the question
  vect.i=x*x - c(x[N], x[-N])*c(x[-1], x[1]), # general data
  roll.i=rollapply(c(x[length(x)],x, x[1]), width=3, function(x) x[2]^2 - x[1]*x[3]), # or:
  roll.tail=rollapply(c(tail(x,1), x, x[1]), width=3, function(x) x[2]^2 - x[1]*x[3])
)
# Unit: microseconds
#      expr       min         lq        mean     median         uq        max neval cld
#      math    33.613    34.4950    76.18809    36.9130    38.0355   2002.152   100  a 
#    vect.i   188.928   192.5315   732.50725   197.1955   198.5245  51649.652   100  a 
#    roll.i 56748.920 62217.2550 67666.66315 68195.5085 71214.9785 109195.049   100   b
# roll.tail 57661.835 63855.7060 68815.91001 67315.5425 71339.6045 119428.718   100   b

score 0 · Answer 3 · answered Apr 26 '17 at 00:33

0

An lapply implementation of your formula would look like this:

x <- c(1:2400000) 
last <- length(x)

re <- lapply(x, function(i) {
    if(i == 1) {
        x[i]*x[i] - x[last]*x[i+1]
    } else if (i == last) {
        x[i]*x[i] - x[i-1]*x[1]
    } else {
        x[i]*x[i] - x[i-1]*x[i+1]  
    }
}) 

re <- unlist(re)

lapply will return a list, so conversion to a vector is done using unlist()

answered Apr 26 '17 at 00:33

andseven

1
1

2

Use `sapply` instead of `lapply` which doesn't return a list but vector/matrix. Or even `vapply` knowing ahead the size and type of output – Parfait Apr 26 '17 at 00:48

score 0 · Answer 4 · edited May 23 '17 at 12:10

0

1) You can avoid all the special-casing in the computation by padding the start and end of array x with copies of the last and first rows; something like this:

N <- NROW(x)
x <- rbind(x[N], x, x[1]) # pad start and end to give wraparound 

re <- lapply(2:N, function(i) { x[i]*x[i] - x[i-1]*x[i+1] } )
#re <- unlist(re) as andbov wrote

# and remember not to use all of x, just x[2:N], elsewhere

2) Directly vectorize, as @Dason's answer:

# Do the padding trick on x , then
x[2:N]^2 - x[1:N-1]*x[3:N+1]

3) If performance matters, I suspect using data.table or else for-loop on i will be faster, since it references three consecutive rows.

4) For more performance, use byte-compiling

5) If you need even more speed, use Rcpp extension (C++ under the hood) How to use Rcpp to speed up a for loop?

See those questions I cited for good examples of using lineprof and microbenchmarking to figure out where your bottleneck is.

edited May 23 '17 at 12:10

Community

1
1

answered Apr 26 '17 at 00:41

smci

32,567
20
113
146

Inside `lapply` shouldn't it be `2:N` instead of `x[2:N]`? Also, this is not performance efficient, takes a lot of time to run. – Globox Apr 26 '17 at 18:17
I liked the padding part. Smart move :) – Globox Apr 26 '17 at 18:21
1

@Chetan: add some random-seeded data to your question details so we can actually run an apples-to-apples comparison. *"It takes a lot of time to run"* is not specific, nor can any of the rest of us verify it. As to N>2.4 million, pick an actual value. I assume you're not blowing out your memory-limit; if you are, all bets are off. – smci Apr 27 '17 at 13:58
1

Sure, 2:N, instead of x[2:N], whatever, the intent of the code is clear. – smci Apr 27 '17 at 14:17
To the downvoter: there's a lot of work went into this, so tell me what you think needs improving. – smci May 01 '17 at 22:34

Optimize performance of a formula spanning three consecutive indices, with wraparound

4 Answers4