8

I am struggling to understand why this bit of code (adapted from the R Benchmark 2.5) becomes slower and slower (on average) as the number of iteration increases.

require(Matrix)  
c <- 0;
for (i in 1:100) {
  a <- new("dgeMatrix", x = rnorm(3250 * 3250), Dim = as.integer(c(3250, 3250)))
  b <- as.double(1:3250)

  invisible(gc())
  timing <- system.time({
    c <- solve(crossprod(a), crossprod(a, b))
  })
  print(timing)

  rm(a, b, c)
}

Here is a sample output, which varies slightly from one run to the next.

As I understand it, nothing should saved from one iteration to the next, yet the timing slowly increases from 1 second in the first few loops to more than 4 seconds in the later loops. Do you have any idea what is causing this, and how I could fix it?

Switching the for loop to an *apply seems to yield similar results.

I know the code is not optimised, but it's coming from a widely used benchmark, and depending on what causes this behaviour, it could indicate a serious bias in its results (which only iterates 3 times by default).

I'm running R version 3.0.1 (x86_64) on Mac OS 10.8.4 with 16 GB RAM (plenty of which is free). The BLAS is OpenBLAS.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
RenéR
  • 204
  • 2
  • 7
  • I am unable to replicate this using a very similar setup. How many iterations does it take to start slowing down? – GSee Jun 23 '13 at 22:55
  • On my computer, each iteration takes ~ 30 seconds. I made 10 and they were all about 30-35 seconds long. – Roman Luštrik Jun 23 '13 at 22:58
  • PC here. A flat 9 seconds per iteration. I am using the ATLAS library which I know makes this kind of operations much faster, at least compared to the BLAS dll that comes by default. OP's mac is still much faster... Anyway, we are starting to be a good bunch not able to reproduce the problem. Maybe the OP should consider printing the whole `system.time` output. *elapsed time* (`[3]`) does not always say the full story; Maybe *user time* will be a better pick. – flodel Jun 23 '13 at 23:42
  • R 3.0.0 mac os 10.8: ~30 sec per run, no slow down over the first 15 or so – Ian Fellows Jun 24 '13 at 00:22
  • OP: Are you really getting ~1sec speeds on that `3250-by-3250` matrix? Is your computer on steroids? Or were you testing with smaller dimensions? Another important question: is this deterioration temporary (within the evaluation of that `for` loop) or permanent? To find that out, run a few extra iterations manually after the `for` loop finishes and tell us if they are slow (4sec) or fast (1sec). – flodel Jun 24 '13 at 01:11
  • R 3.0.0 64 @win7. No slowdown perceived for 100 iterations using 1000 instead of 3250. – Ferdinand.kraft Jun 24 '13 at 01:17
  • @GSee Slowdown is progressive, 1.1 seconds after 20 iterations, 2.4 seconds after 40 iterations, etc. – RenéR Jun 24 '13 at 06:51
  • @flodel I have a MacBook Pro with Intel Core i7 2.5 GhZ. I'm using OpenBLAS, and it is pretty fast indeed! Do you think it's worth trying with a different BLAS? No doubt the default BLAS will be much slower. I'm going to try and run a few iterations manually after the loop finishes, and also look at user time, and report back. – RenéR Jun 24 '13 at 06:56
  • @flodel A few manual extra iterations all run in 1 second. I've added a sample output in a gist for the whole loop. – RenéR Jun 24 '13 at 07:35
  • This sounds more like a BLAS (specifically, OpenBLAS) question, rather than an R question. Retagging.... – Hong Ooi Jun 24 '13 at 08:04
  • @HongOoi Here are the results I get for the [default BLAS, a 1000*1000 matrix with 500 iterations](https://gist.github.com/renerien/5848795). As others reported, nothing visible after 100 iterations, but it looks to me something is happening later on (although less dramatic than my initial example). So apart from obvious speed differences, I'm not sure it's BLAS-specific. – RenéR Jun 24 '13 at 09:22
  • One more data point: i7, Windows7, R3.0. I see no change in either User or System time per cycle. CPU usage and RAM allocation to R do not change at all. Maybe your Sanitation Workers Union is on strike? (cheap joke about gc() not functioning) – Carl Witthoft Jun 24 '13 at 12:37
  • @CarlWitthoft Thanks for the data point. I probably need to try on a different OS. CPU does change quite a bit, but RAM does not (and I've got a lot more available). I guess it could be a garbage collection problem, but then I would see RAM filling up, wouldn't I? – RenéR Jun 27 '13 at 08:46

2 Answers2

1

One solution would be to use the compiler package to compile your code into byte code. This should eliminate the odd timing issues as it will be calling the same compiled code each iteration. It should also make your code faster. To enable the compiler on your code, include the two lines below:

library(compiler)
enableJIT(3)

If compiling the code does not eliminate the issue, then the set of suspect problems will be narrowed down.

johneric
  • 955
  • 2
  • 8
  • 11
0

Perhaps you could try making the code within the for loop into a function. This way there is really no way one run could impact another. Also, it removes the messiness caused by excessive rm() and gc() use.

require(Matrix)

NewFun <- function() {
    a <- new("dgeMatrix", x = rnorm(3250 * 3250), Dim = as.integer(c(3250, 3250)))
    b <- as.double(1:3250)
    timing <- system.time({
        c <- solve(crossprod(a), crossprod(a, b))
    })
    print(timing)
}

for (i in 1:100) {
    NewFun()
}
Will Beason
  • 3,417
  • 2
  • 28
  • 46