5

I have a small part in my code thats similar to this one (ofcourse with real matrices instead of the zero filled ones):

x = [rinterface.FloatSexpVector([0]*(1000**2)) for i in xrange(20)]
y = robjects.r('list')(x)

and it looks like its causing memory leaks.

When running the following code:

for i in xrange(10):
    x = [rinterface.FloatSexpVector([0]*(1000**2)) for i in xrange(20)]
    y = robjects.r('list')(x)
    del x
    del y
    robjects.r('gc(verbose=TRUE)')

I get:

Error: cannot allocate vector of size 7.6 Mb
In addition: Warning messages:
1: Reached total allocation of 2047Mb: see help(memory.size)
2: Reached total allocation of 2047Mb: see help(memory.size)
3: Reached total allocation of 2047Mb: see help(memory.size)
4: Reached total allocation of 2047Mb: see help(memory.size)

Is this a bug or is there something else I should do? I've also tried making the variable named by putting them into robjects.globalenv and then rm()-ing them before the gc() but it doesnt seem to work.

I should mention that I'm running rpy 2.3dev on windows but this happens also on linux with rpy 2.2.6 (though since the linux runs 64 bit versions and not 32 bit like the windows machine does, the memory just grows and I dont get the 2047mb error)

EDIT: It seems like adding gc.collect() before the R gc() resolves the issue with the first code example, however this didnt solve my problem - digging deeper into my code I found that the line that causes the problem is with assigning a value into .names, similar to this:

x = [rinterface.FloatSexpVector([0]*(1000**2)) for i in xrange(20)]
y = robjects.r('list')(x)[0]
y.names = rinterface.StrSexpVector(['a']*len(y))

putting rinterface.NULL before cleaning doesn't help either. any suggestions?

itai
  • 1,566
  • 1
  • 12
  • 25
  • I missed the part about Windows when answering. Two things: 1) Windows is not really supported (so let's stick to Linux for now), 2) parading with a Windows will make we wonder why are we not flooded with contributions from you for rpy2 on Windows. – lgautier Sep 05 '12 at 16:04
  • Well, basically the current windows port is good enough. I was actually starting to work on it but then someone else already released the newer version which just works for me... maybe i will be on it again when I hit the next windows-specific problem i encounter. – itai Sep 06 '12 at 07:57

2 Answers2

2

It might be because Python is unaware of the amount of memory allocated by the embedded R, and therefore does not know that garbage should be collectected.

There is a bit about memory usage in the documentation for rpy2, and an earlier question on SO

Your edit suggests there might be something going on. The best is to file a bug report on the bitbucket page for rpy2 and continue troubleshooting there rather than here.

Community
  • 1
  • 1
lgautier
  • 11,363
  • 29
  • 42
  • Thanks, that helped. But I investigated the problem further - and I figured there's another problem.. see my edit.. – itai Sep 05 '12 at 12:22
0

I don't think it's a memory leak. I'll try to give you the following perspective:

try these examples in a python shell:

l = range(32 * 1024 * 1024)

So we try to make the interpreter allocate 128 MB of contiguous memory area. This will work (it takes ~7 seconds on my machine)

You can play with various values (it still works for 256MB); try also for N = 128 * 1024 * 1024; from this value up my machine simply hangs. If I were patient enough I would probably have my machine back after several minutes. But the thing is that the interpreter cannot easily allocate big chunks of contiguous memory area.

It's worth saying that I can allocate 1GB of memory in C++ in the same way and it takes less than 1 second on the same machine (i7 with 8GB RAM, Windows 8 or CentOS6 - I tried both OS-es). The same thing I tried with Java.

I didn't spend time to investigate why the python heap allocator behaves like this. I can only speculate that the rpy guys tried to discourage / limit you from allocating too much, so they put a lower limit so that nothing bad happens; in reality you could have multiple smaller arrays with some reference objects in it (which occupy more than 4 bytes).

Paul Ianas
  • 501
  • 3
  • 4