I've noticed some oddities in the memory usage of my program running under PyPy and Python. Under PyPy the program uses not only a substantially larger initial amount of memory than CPython, but this memory usage increases over time quite dramatically. At the end of the program under PyPy it's using around 170MB, compared to 14MB when run under CPython.
I found a user with the exact same problem, albeit on a smaller scale, but the solutions which worked for him provided only a minor help for my program pypy memory usage grows forever? The two things I tried changing were setting the environment variables PYPY_GC_MAX to be 100MB and PYPY_GC_GROWTH = 1.1, and also manually calling gc.collect() at each generation.
I'm determining the memory usage with
resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1000
Here's the runtime and memory usage under different conditions:
Version: time taken, memory used at end of run
PyPy 2.5.0: 100s, 173MB
PyPy with PYPY_GC_MAX = 100MB and PYPY_GC_GROWTH = 1.1: 102s, 178MB
PyPy with gc.collect(): 108s, 131MB
Python 2.7.3: 167s, 14MB
As you can see the program is much quicker under PyPy than CPython which is why I moved to it in the first place, but at the cost of a 10-fold increase in memory.
The program is an implementation of Genetic Programming, where I'm evolving an arithmetic binary tree over 100 generations, with 200 individuals in the population. Each node in the tree has a reference to its 2 children and these trees can increase in size although for this experiment they stay relatively stable. Depending on the application this program can be running for 10 minutes up to a few hours, but for the results here I've set it to a smaller dataset to highlight the issue.
Does anyone have any idea a) what could be causing this, and b) if it's possible to limit the memory usage to somewhat more respectable levels?