7

How can I clear objects (and the memory they occupy) created via rpy?

import rpy2.robjects as r
a = r.r('a = matrix(NA, 2000000, 50)')
del a    #if I do this, there is no change in the amount of memory used
r.r('rm(list=(ls(all=TRUE)))') # Same here, the objects disappear, but the memory is still used

The unfortunate effect is that in my application, memory usage increases until there is not enough and then it crashes... From the rpy2 docs:

The object itself remains available, and protected from R’s garbage collection until foo is deleted from Python

but even doing:

import rpy2.robjects as r
a = r.r('a = matrix(NA, 2000000, 50)')
r.r.rm('a')
del a
r.r.gc()

does not free the memory used...

EDIT: rpy2 2.0, Win XP, R 2.12.0

Benjamin
  • 11,560
  • 13
  • 70
  • 119

1 Answers1

6

There is a paragraph in the rpy docs hinting that you may need to run the Python garbage collector frequently when deleting or overwriting large objects:

R objects live in the R memory space, their size unbeknown to Python, and because of that it seems that Python does not always garbage collect often enough when large objects are involved. This is sometimes leading to transient increased memory usage when large objects are overwritten in loops, and although reaching a system’s memory limit appears to trigger garbage collection, one may wish to explicitly trigger the collection.

I was able to force rpy2 to free that large matrix by running gc.collect() immediately after creating the matrix, and again just after deleting it and running R's internal gc() function. Running it in a loop with a sleep -- use top to watch the memory usage increase / decrease.

Running under Python 2.6 on Ubuntu 10.0.4 with python-rpy version 2.0.8 linked to R version 2.10.1. Hope this helps you make some progress:

import gc
import time

import rpy2.robjects as R

for i in range(5):
    print 'pass %d' % i
    R.r('a = matrix(NA, 1000000, 50)')
    gc.collect()
    R.r('rm(a)')
    R.r('gc()')
    gc.collect()

    print 'sleeping..'
    time.sleep(5)
samplebias
  • 37,113
  • 6
  • 107
  • 103
  • That helps, thanks. If I run it line by line in IDLE, however, I need to run gc.collect() more than once for it to work. Any idea why? – Benjamin Mar 07 '11 at 14:46
  • I believe it has to do with the fact that gc.collect() will free up memory for use within the Python process for other allocations (return blocks to the internal pool), not necessarily release it back to the operating system immediately. So the multiple calls to collect may be prodding it to do so sooner. – samplebias Mar 07 '11 at 15:12
  • 1
    Is it possible to do this if I do not give an R name to the object, as in `a = R.r('matrix(NA, 1000000, 50)')`? – highBandWidth Jun 15 '11 at 23:57