Python C Extension - Memory Leak despite Refcount = 1 on returned PyObjects

Question

I'm repeatedly calling a python module I wrote in C++ using the Python C API. My python program repeatedly calls my module's pyParse function, which does a bunch of stuff and returns a PyTuple that contains more PyTuple objects as elements. Every returned object ends up with a PyObject->refcnt of 1, so you know the object should be deleted when it goes out of scope in python. I repeatedly call this module with something like the following python code:

import my_module #this is my c++ module.
import os
path = 'C:/data/'
for filename in os.listdir(path):
    data = my_module.pyParse(path+filename)

The longer this loop runs, the more the memory usage blows up. Every iteration produces about 2kb of tuples (which should be destroyed at end of every iteration). Yet when I take "heap snapshots" and compare an early one to another many more iterations later, you can see the allocation of memory called by PyTuple_New and other python objects keeps growing.

Yet because every returned object has 1 as a reference count, I would expect that it would be destroyed after going out of scope in python. Finally, my program ends in a read access violation in a random part of the code. Is there something I am missing? Or does anyone know how to possibly debug this and get a better handle on what's going on? I'm desperate!

Possible to get a complete reproducer, including the C code? — solidpixel, Jun 03 '16 at 21:21
I'll try. It's going to be rough - my code is like 3000 lines, but I guess I should try anyways as part of the debugging process. — JoseOrtiz3, Jun 03 '16 at 21:24
One other thought - if you explicitly call gc.collect() does it help? I just wonder if the GC isn't getting called if you are in a tight loop. — solidpixel, Jun 03 '16 at 21:27
Also beware of memory framentation - Python's heap doesn't defragment - so even if objects do get freed, you may find heap grows regardlesss. — solidpixel, Jun 03 '16 at 21:30
I think gc.collect() after every iteration helped a lot. Now memory usage stays a lot flatter. Thanks! I had another problem where I had a `read access violation` that didn't appear to be real - It would say `variable was 0x72` or `0xFFFFFFFF` and I would look and that variable was fine. I put a try-catch block around it, but after I did that the error no longer occurred (no exception raised?!). Still learning... — JoseOrtiz3, Jun 04 '16 at 00:32
Turned out that I was really having a concurrency issue where my threads weren't stopping unless I delayed the main thread by a millisecond, but still gc.collect() improved the memory usage at least according to visual studio's heap profiler. — JoseOrtiz3, Jun 07 '16 at 17:35

Python C Extension - Memory Leak despite Refcount = 1 on returned PyObjects

0 Answers0