31

So I think I don't quite understand how memory is working in R. I've been running into problems where the same piece of code gets slower later in the week (using the same R session - sometimes even when I clear the workspace). I've tried to develop a toy problem that I think reproduces the "slowing down affect" I have been observing, when working with large objects. Note the code below is somewhat memory intensive (don't blindly run this code without adjusting n and N to match what your set up can handle). Note that it will likely take you about 5-10 minutes before you start to see this slowing down pattern (possibly even longer).

N=4e7 #number of simulation runs
n=2e5 #number of simulation runs between calculating time elapsed
meanStorer=rep(0,N);
toc=rep(0,N/n);
x=rep(0,50);

for (i in 1:N){
  if(i%%n == 1){tic=proc.time()[3]}
  x[]=runif(50);
  meanStorer[i] = mean(x);
  if(i%%n == 0){toc[i/n]=proc.time()[3]-tic; print(toc[i/n])}
} 
plot(toc)

meanStorer is certainly large, but it is pre-allocated, so I am not sure why the loop slows down as time goes on. If I clear my workspace and run this code again it will start just as slow as the last few calculations! I am using Rstudio (in case that matters). Also here is some of my system information

  • OS: Windows 7
  • System Type: 64-bit
  • RAM: 8gb
  • R version: 2.15.1 ($platform yields "x86_64-pc-mingw32")

Here is a plot of toc, prior to using pre-allocation for x (i.e. using x=runif(50) in the loop)

enter image description here

Here is a plot of toc, after using pre-allocation for x (i.e. using x[]=runif(50) in the loop)

enter image description here

Is ?rm not doing what I think it's doing? Whats going on under the hood when I clear the workspace?

Update: with the newest version of R (3.1.0), the problem no longer persists even when increasing N to N=3e8 (note R doesn't allow vectors too much larger than this)

enter image description here

Although it is quite unsatisfying that the fix is just updating R to the newest version, because I can't seem to figure out why there was problems in version 2.15. It would still be nice to know what caused them, so I am going to continue to leave this question open.

WetlabStudent
  • 2,556
  • 3
  • 25
  • 39
  • 2
    Have you used any of the profiling tools to see where the slowdown might be actually occurring? [This SO thread](http://stackoverflow.com/questions/3650862/how-to-efficiently-use-rprof-in-r) has some profiling tips for R3.0.0+ – hrbrmstr May 28 '14 at 00:11
  • 1
    I know little about memory usage, but I know R has this concept of contiguous space. It could be that after many iterations and in spite of cleaning, R finds it more and more difficult to find a contiguous memory space for `x`. One thing I would consider testing is to also pre-allocate space for `x` and `tic`. And maybe explicitly delete (`rm`) `x` and `tic` after each iteration, although I'm less positive about that one. – flodel May 28 '14 at 00:16
  • 2
    I didn't observe any such slow down when running your code. There was the odd spike here and there (certainly nothing like your obvious stepped increases in run time), but overall it was pretty steady around 1.93 (SD = 0.04), right up until processing had completed. I'm on Win 8.1 x64 with 16GB RAM, running R 3.1.0 from RStudio. – jbaums May 28 '14 at 00:17
  • @jbaums could it be that your setup has more space/memory than mine? If you increased N could you possibly duplicate the slow down? – WetlabStudent May 28 '14 at 00:20
  • @flodel Now I might be incredibly naive here, but isn't the first generation of x, effectively prelalocating it? Also when clearing the workspace using `rm` and rerunning the code, the calculations for i=1 are as slow as when it left off last time. – WetlabStudent May 28 '14 at 00:34
  • 2
    No, `x` is reallocated every time you do `x <- runif(50)`. Pre-allocating it would be 1) `x <- rep(0, 50)` outside the loop and 2) `x[] <- runif(50)` inside the loop. – flodel May 28 '14 at 00:38
  • I've just finished running 8e7 iterations, and once again it was steady around 1.93±0.04. I noticed spikes to ~2 if I was web-browsing during processing. I assume you're just leaving R to do it's thing, without using the computer for anything else simultaneously? – jbaums May 28 '14 at 00:41
  • @jbaums yep, nothing but one web browser on this question page and the single R session. The stair case result is pretty consistent. And every time I run the code again, even after clearing the workspace (using rm or pressing clear in the environment pane), the stair case starts where it left off, so the code is getting slower and slower every time. I just tried pre-allocating for `x` and that also didn't help. – WetlabStudent May 28 '14 at 00:47
  • 2
    Does it return to the "bottom step" after `gc()`? – jbaums May 28 '14 at 00:50
  • @jbaums no it continues at the top step – WetlabStudent May 28 '14 at 00:54
  • Your code is still showing `x=rep(0,50);`. It should be `x[]=rep(0,50);` if you want to avoid re-allocating. To check if it helps, you would have to restart R. – flodel May 28 '14 at 01:00
  • @flodel Thanks for putting up with my mistake there, unfortunately changing it to x[] didn't help resolve the problem, but at least I now understand how to do pre-allocate, apparently I've been doing that wrong for a while. Restarting R did cause the step function to restart at the lowest step, but the problem persists even with the pre-allocation. – WetlabStudent May 28 '14 at 01:12
  • @flodel woh ... now this is strange, look at the two plots one generated with using `x[] =` and one with using `x =` – WetlabStudent May 28 '14 at 01:24
  • 1
    maybe you should update your first plot after restarting R. So we really know the pre-allocation is the only variable. – flodel May 28 '14 at 01:28
  • 1
    On 64bit Windows R-2.15.0 I saw a slight but consistent slow down with time; with R-3.1.0 (just a download away...might as well work with the current version!) speed was constant and up to 10% faster. On linux there was a considerable performance consequence when the OS migrated two replicates onto separate cores of the same processor. The time also seemed to slow down as the processor on my laptop got hotter. What does the Windows task manager have to say about other processes? – Martin Morgan May 28 '14 at 01:35
  • ...and writing your test case a function and using compiler::cmpfun(f) increased speed by about 15%. – Martin Morgan May 28 '14 at 01:40
  • @flodel just did that, and now it looks linear – WetlabStudent May 28 '14 at 01:44
  • @MartinMorgan windows resouce manager gives us about 12-14% cpu total (all other processes two orders of magnitude less). My machine has 8 cores and it appears that it might be auto-parallelizing, using 3-4 of the CPUs at on average 50% usage or so. Interesting that using the older R gave you a slow down. I am using an older version. Let me update and see if I still have the same problem. – WetlabStudent May 28 '14 at 01:47
  • k... just throwing more ideas of things I would try if I were you. What is `memory.size() / memory.limit()` at the beginning and end of your simulation (constant?). How dependent is your simulation on the length of `x` (e.g try 5000)? – flodel May 28 '14 at 01:58
  • Just to add another data point, I actually found a slight increase in calculation speed over time, going from about 5 sec per iteration near the beginning to 4.8 sec near the end. This is on a 2011 Macbook Pro and R 3.1.0 64-bit. – eipi10 May 28 '14 at 02:12

1 Answers1

3

As you state in your updated question, the high-level answer is because you are using an old version of R with a bug, since with the newest version of R (3.1.0), the problem no longer persists.

WetlabStudent
  • 2,556
  • 3
  • 25
  • 39
JustinJDavies
  • 2,663
  • 4
  • 30
  • 52