2

I know this is a recurrent question, but I cannot deallocate memory efficiently in my case using any of the suggested solutions. So here is my code:

from ete2 import Tree

for i in i_iminus1_pool_dict.keys():
    Assignment_Tree = Tree()
    Root = Assignment_Tree.get_tree_root()
    Root.add_feature("name", i)
    populate_tree() # this function extends the branches of the Tree and adds leaves
    for leaf in Assignment_Tree.iter_leaves():
        chain = []
        score = leaf.dist
        chain.append(leaf.name)
        for ancestor in leaf.get_ancestors():
            chain.append(ancestor.name)
        del chain
        del ancestor
        del leaf
    del Assignment_Tree
    gc.collect()

The Tree() object comes from ete2 package and when populated with branches and leaves consumes a lot of memory. As you see I must create a new Tree() for many times, however, deletion and garbage collection does not seem to release memory. Can anyone suggest what else I could do to effectively delete the Tree object at the end of each iteration of the for loop?

tevang
  • 518
  • 1
  • 4
  • 17
  • 1
    *"does not seem to release memory"* - according to what? Have you used a Python memory profiler? Which version of Python are you using (see e.g. http://stackoverflow.com/q/3916553/3001761)? – jonrsharpe Apr 30 '15 at 10:08
  • Python will almost never release memory that it's allocated. But that rarely matters; as long as you're not retaining garbage, it'll reuse that memory for the next large object. (And this is a good thing—if it kept `free`ing and `malloc`ing over and over, that would do nothing but slow everything down.) – abarnert Apr 30 '15 at 10:13
  • Anyway, `del Assignment_Tree` doesn't actually delete anything, it just removes that variable's reference to the tree. If there are any other references to the tree—e.g., one of those objects you're storing has a reference back to its tree node, and the node has a reference back to the whole tree—then the tree is still alive. – abarnert Apr 30 '15 at 10:16
  • Actually, you don't `del Root`, and I'll bet `Root` has a reference either to `Assignment_Tree` or to most of the data within it. So, after the `del Assignment_Tree`, the tree is still alive. So `gc.collect()` does nothing. But it still doesn't matter—next time through the loop, you reassign `Assignment_Tree` and `Root` right off the bat, so at worst you have (part of) two trees alive at any given time, not many. – abarnert Apr 30 '15 at 10:19
  • I use python 2.7 and monitor memory consumption through "htop" linux command. – tevang Apr 30 '15 at 10:20

1 Answers1

1

First, I'm not convinced you actually have a problem. But let's assume you do.

Can anyone suggest what else I could do to effectively delete the Tree object at the end of each iteration of the for loop?

You could try to figure out who's keeping a reference to it alive, and del that too. I notice that you missed Root; I'll bet that has a reference to either the Tree object, or most of its data.

But the simple way to do it is to use scopes. Just refactor the loop body into a function, and all those variables created inside the loop become local variables inside the function, and they all go away when the function returns:

def do_tree_stuff(i):
    Assignment_Tree = Tree()
    Root = Assignment_Tree.get_tree_root()
    # ...
    Root.add_feature("name", i)
    populate_tree() # this function extends the branches of the Tree and adds leaves
    for leaf in Assignment_Tree.iter_leaves():
        chain = []
        score = leaf.dist
        chain.append(leaf.name)
        for ancestor in leaf.get_ancestors():
            chain.append(ancestor.name)

for i in i_iminus1_pool_dict.keys():
    do_tree_stuff(i)

As long as the function doesn't mutate any globals or closure cells, it can't possibly leave anything behind in its caller's locals. So you don't need to try to figure out what locals might have gotten modified and del them; you know none of them got modified, and you don't have to do anything.

And if you want to refactor the inner loop into another function, go for it.


If you're retaining data that you shouldn't be—i.e., something in that loop is mutating something that lives outside the loop that has a reference to a leaf that has a reference to the root that has a reference to the whole tree—then that actually is a problem, and you need to fix it. But I can't see anything in your posted code that could be doing that.


But meanwhile, this still won't actually release memory to the OS. Once Python's allocated memory, it generally keeps it. But it will reuse it. If the first tree is garbage when you create the second tree, it'll put the second tree in the same memory as the first one. This is generally a much better thing to do than calling malloc and free all over the place—but, even in the rare cases when it isn't, you can't stop Python from doing it.

If you really do need to allocate and free memory repeatedly, you can always take that function you refactored and spin it off into a child process, using multiprocessing. When a process goes away, all of its memory goes away. But most likely, that will just add overhead for no benefit.

abarnert
  • 354,177
  • 51
  • 601
  • 671