18

I have a problem with my python application, and I think it's related to the python garbage collection, even if I'm not sure...

The problem is that my application takes a lot of time to exit and to switch to one function to the next one.

In my application I handle very large dictionaries, containing thousands of large objects which are instantiated from wrapped C++ classes.

I put some timestamp outputs in my program, and I saw that at the end of each function, when objects created inside the function should go out of scope, a lot of time is spent by the interpreter before calling the next function. And I observe the same problem at the end of the application, when the program should exit: a lot of time (~ hours!) is spent between the last timestamp on screen and the appearance of the fresh prompt.

The memory usage is stable, so I don't really have memory leaks.

Any suggestions?

Can be the garbage collection of thousands of large C++ objects that slow?

Is there a method to speed up that?

UPDATE:

Thanks a lot for all your answers, you gave me a lot of hints to debug my code :-)

I use Python 2.6.5 on Scientific Linux 5, a customized distribution based on Red Hat Enterprise 5. And actually I'm not using SWIG to get Python bindings for our C++ code, but the Reflex/PyROOT framework. I know, it's not very known outside particle physics (but still open source and freely available) and I have to use it because it's the default for our main framework.

And in this context the DEL command from the Python side does not work, I had already tried it. DEL only deletes the python variable linked to the C++ object, not the object itself in memory, which is still owned by the C++ side...

...I know, it's not-standard I guess, and a bit complicated, sorry :-P

But following your hints, I'll profile my code and I'll come back to you with more details, as you suggested.

ADDITIONAL UPDATE:

Ok, following your suggestions, I instrumented my code with cProfile, and I discovered that actually the gc.collect() function is the function taking the most of the running time!!

Here the output from cProfile + pstats print_stats():


    >>> p.sort_stats("time").print_stats(20)
Wed Oct 20 17:46:02 2010    mainProgram.profile

         547303 function calls (542629 primitive calls) in 548.060 CPU seconds

   Ordered by: internal time
   List reduced from 727 to 20 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        4  345.701   86.425  345.704   86.426 {gc.collect}
        1  167.115  167.115  200.946  200.946 PlotD3PD_v3.2.py:2041(PlotSamplesBranches)
       28   12.817    0.458   13.345    0.477 PlotROOTUtils.py:205(SaveItems)
     9900   10.425    0.001   10.426    0.001 PlotD3PD_v3.2.py:1973(HistoStyle)
     6622    5.188    0.001    5.278    0.001 PlotROOTUtils.py:403(__init__)
       57    0.625    0.011    0.625    0.011 {built-in method load}
      103    0.625    0.006    0.792    0.008 dbutils.py:41(DeadlockWrap)
       14    0.475    0.034    0.475    0.034 {method 'dump' of 'cPickle.Pickler' objects}
     6622    0.453    0.000    5.908    0.001 PlotROOTUtils.py:421(CreateCanvas)
    26455    0.434    0.000    0.508    0.000 /opt/root/lib/ROOT.py:215(__getattr__)
[...]

>>> p.sort_stats("cumulative").print_stats(20)
Wed Oct 20 17:46:02 2010    mainProgram.profile

         547303 function calls (542629 primitive calls) in 548.060 CPU seconds

   Ordered by: cumulative time
   List reduced from 727 to 20 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001  548.068  548.068 PlotD3PD_v3.2.py:2492(main)
        4    0.000    0.000  346.756   86.689 /usr/lib//lib/python2.5/site-packages/guppy/heapy/Use.py:171(heap)
        4    0.005    0.001  346.752   86.688 /usr/lib//lib/python2.5/site-packages/guppy/heapy/View.py:344(heap)
        1    0.002    0.002  346.147  346.147 PlotD3PD_v3.2.py:2537(LogAndFinalize)
        4  345.701   86.425  345.704   86.426 {gc.collect}
        1  167.115  167.115  200.946  200.946 PlotD3PD_v3.2.py:2041(PlotBranches)
       28   12.817    0.458   13.345    0.477 PlotROOTUtils.py:205(SaveItems)
     9900   10.425    0.001   10.426    0.001 PlotD3PD_v3.2.py:1973(HistoStyle)
    13202    0.336    0.000    6.818    0.001 PlotROOTUtils.py:431(PlottingCanvases)
     6622    0.453    0.000    5.908    0.001 /root/svn_co/rbianchi/SoftwareDevelopment

[...]

>>>

So, in both outputs, sorted by "time" and by "cumulative" time respectively, gc.collect() is the function consuming the most of the running time of my program! :-P

And this is the output of the memory profiler Heapy, just before returning the main() program.

memory usage before return:
Partition of a set of 65901 objects. Total size = 4765572 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  25437  39  1452444  30   1452444  30 str
     1   6622  10   900592  19   2353036  49 dict of PlotROOTUtils.Canvas
     2    109   0   567016  12   2920052  61 dict of module
     3   7312  11   280644   6   3200696  67 tuple
     4   6622  10   238392   5   3439088  72 0xa4ab74c
     5   6622  10   185416   4   3624504  76 PlotROOTUtils.Canvas
     6   2024   3   137632   3   3762136  79 types.CodeType
     7    263   0   129080   3   3891216  82 dict (no owner)
     8    254   0   119024   2   4010240  84 dict of type
     9    254   0   109728   2   4119968  86 type
  Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
    10   1917   3   107352   2   4264012  88 function
    11   3647   5   102116   2   4366128  90 ROOT.MethodProxy
    12    148   0    80800   2   4446928  92 dict of class
    13   1109   2    39924   1   4486852  93 __builtin__.wrapper_descriptor
    14    239   0    23136   0   4509988  93 list
    15     87   0    22968   0   4532956  94 dict of guppy.etc.Glue.Interface
    16    644   1    20608   0   4553564  94 types.BuiltinFunctionType
    17    495   1    19800   0   4573364  94 __builtin__.weakref
    18     23   0    11960   0   4585324  95 dict of guppy.etc.Glue.Share
    19    367   1    11744   0   4597068  95 __builtin__.method_descriptor

Any idea why, or how to optimize the garbage collection?

Is there any more detailed check I can do?

pseyfert
  • 3,263
  • 3
  • 21
  • 47
rmbianchi
  • 6,241
  • 6
  • 26
  • 27
  • 9
    "Any suggestions?". Use the profiler to get more facts on where the time is being spent. Post the results as an update to your question. – S.Lott Oct 12 '10 at 18:07
  • @nos: Actually, Python uses refcounting, so an unreferenced object *will* be collected. Python's GC is rather simple compared to the clever beasts in the good JVMs and in .NET. –  Oct 12 '10 at 21:59
  • @delnan to be precise, CPython's implementation has that behavior. I seem to recall some experimental versions with much fancier algorithms. – Paul McMillan Oct 12 '10 at 22:04
  • @Paul: True, IronPython and Jython use the respective VM's GC (and therefore no refcouting). But usually, this can be ignored ;) –  Oct 12 '10 at 22:07
  • @delnan Yep... being able to ignore GC makes my life better. This is actually part of what I really enjoy about Python - having so many implementations means that you can really choose the best one for your particular task. – Paul McMillan Oct 12 '10 at 22:09
  • 1
    Which OS and which version? So your memory usage is stable -- stably larger than the real memory available to your process? How much physical real memory do you have? How much is free before the process starts? just before it exits? Have you tried running it with 1/4, 1/2, 3/4 of the normal number of objects, and what do you observe? Does it gradually become sluggish, or does it suddenly "hit the wall"? Consider telling us how many objects/minutes/hours instead of "a lot" and "thousands" etc. – John Machin Oct 13 '10 at 00:34
  • And if you have big memory leaks up front, and none afterwards, your memory usage would be stable, wouldn't it? Create a list of N of your objects. Check how much memory is used. Ditto with [None] * N. Does difference / N meet expectations? – John Machin Oct 13 '10 at 00:42
  • Thanks to all of you for the useful comments! I did as you suggested, I profiled my program and actually it seems that the gc.collect function takes the most of the time... @John: I also posted the output of Heapy, but I still have to try to test against 1/4, 1/2, 3/4 of the usual number of my objects. I'll post those tests asap. Thanks again – rmbianchi Oct 21 '10 at 09:20
  • One last thing to check, that I don't think you've mentioned: Are you disk thrashing? You mention that your objects are large, make sure you're not hitting swap, as having to do disk I/O will always slow things down. – Thanatos Dec 11 '11 at 23:29

3 Answers3

13

This is known garbage collector issue in Python 2.6 causing quadratic time for garbage collection when many objects are being allocated without deallocating any of them ie. population of large list.
There are two simple solutions:

  1. either disable garbage collection before populating large lists and enable it afterwards

    l = []
    gc.disable()
    for x in xrange(10**6):
      l.append(x)
    gc.enable()
    
  2. or update to Python 2.7, where the issue has been solved

I prefer the second solution, but it's not always an option;)

Leszek
  • 1,290
  • 2
  • 11
  • 21
5

Yes, it could be garbage collection, but it could also be some synchronisation with the C++ code, or something completely different (hard to say without code).

Anyway, you should have a look at SIG for development of Python/C++ integration to find issues and how to speed up things.

kriss
  • 23,497
  • 17
  • 97
  • 116
  • Hi kriss, apparently it has a garbage collector problem. I updated my question with the output of cProfile, and it seems that gc.collect takes the most of the running time... I'll take a look at the SIG Python/C++ as you suggested, to see if it is a known problem in some contexts. – rmbianchi Oct 21 '10 at 09:10
  • raw bet: the gc is probably in a case where refcounting doesn't work well (has to use a more complex strategy to reclaim memory). Any loops in your data structure (cyclic references) ? If it's the source of the problem cutting them "by hand" could be a solution, it would make gc happier (hence faster). – kriss Oct 21 '10 at 09:56
-5

If your problem really is the garbage collection, try explicitly freeing your objects when you're done with them using del().

In general, this doesn't sound like a garbage collection problem, unless we're talking about terabytes of memory.

I agree with S.Lott... profile your app, then bring code snippets and the results of that back and we can be much more helpful.

Paul McMillan
  • 19,693
  • 9
  • 57
  • 71
  • 2
    `del` doesn't free anything. It just removes a variable from the current scope, i.e. removes one reference. But Python is refcounted (a more sophisticated GC does exist and is run, but only on cyclic references) - it doesn't really matter if a bunch of objects get gc'd at the end of the function or in little chunks when youthink you're done. –  Oct 12 '10 at 21:56
  • In general, yes. In the pathological case, freeing little chunks might help. Habitually using `del` all over the place tends to be a sign you're not programming in Python. – Paul McMillan Oct 12 '10 at 22:07
  • 1
    Thanks, Paul and Delnan. Actually I also tried to use del(), but in this context does not work. As I told in the update of my question, I'm using an open source framework called ROOT (http://root.cern.ch), with its own python bindings system (called Reflex), and even if del() deletes all the python references to a C++ object, the object itself stays in memory... But even though now I found a way to explicitely delete them as well, the gc.collect function seems to take the most of the running time... Any suggestions for further checks? Thanks a lot again for your help – rmbianchi Oct 21 '10 at 09:17