9

My python program has a curious performance behavior: The longer it runs, the slower it gets. Early on, it cranks out tens of work units per minute. After an hour of so it is taking tens of minutes per work unit. My suspicion is that this is cause by a congested garbage collector.

The catch is that my script is too memory hungry for cProfile to work on large runs. (see: cProfile taking a lot of memory)

We have written our own performance plugin and we can observe most parts of our system and none of them seem to be the problem. The one rock that is still unturned is the GC.

Is there some other way (besides profile or cProfile) to see how much time is going to the GC?

Community
  • 1
  • 1
Matthew Scouten
  • 15,303
  • 9
  • 33
  • 50

1 Answers1

9

In Python, most garbage is collected using reference counting. One would expect this to be quick and painless, and it seems unlikely that this is what you're after. I assume you're asking about the collector referred to by the gc module, which is only used for circular references.

There are a few things that might be of use: http://docs.python.org/library/gc.html

Although there doesn't appear to be a direct method to time the garbage collector, you can turn it on and off, enable debugging, look at collection count etc. All of this might be helpful in your quest.

For example, on my system gc prints out the elapsed time if you turn on the debug flags:

In [1]: import gc

In [2]: gc.set_debug(gc.DEBUG_STATS)

In [3]: gc.collect()
gc: collecting generation 2...
gc: objects in each generation: 159 2655 7538
gc: done, 10 unreachable, 0 uncollectable, 0.0020s elapsed.

All of this aside, the first thing I would look at is the evolution of the memory usage of your program as it runs. One possibility is that it is simply reaching the limit of available physical RAM and is slowing down due to excessive page faults, rather than due to anything to do with the garbage collector.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Yes, I am aware of that. Those can tell me how much stuff the GC is dealing with, but my immediate problems are with processor cycles, not memory. If it could tell me the *time* that the GC has used or even *how often* it has run, that might be helpful. – Matthew Scouten May 24 '11 at 18:48
  • @Matthew: would it help for your process to call `gc.collect()` itself, and time how long it takes to return? – Thomas K May 24 '11 at 18:50
  • @Thomas K, that would tell me how long a single collection was taking, but not how often the collector is running. Given other data I have, I suspect that the GC is running more and more often (to less and less effect) as the system runs. – Matthew Scouten May 24 '11 at 18:54
  • @Matthew Scouten: Please see my answer. If you turn on `gc.DEBUG_STATS`, that'll print out a message every time the collector is run, along with the elapsed time for each invocation. – NPE May 24 '11 at 18:58
  • @Matthew Scouten: Alternatively, switch the thing off and see if anything changes (again, as alluded to by my answer). – NPE May 24 '11 at 18:59
  • @aix interesting. Is there a way to redirect this info to a log object or file? or access it programmaticly? It would be enormously more useful if it could be isolated in a separate file, perhaps with a time stamp some additional context. – Matthew Scouten May 24 '11 at 19:13