3

I have the following code in in cython in the pyx file, which converts wchar_t* to python string (unicode)

// All code below is python 2.7.4

cdef wc_to_pystr(wchar_t *buf):
    if buf == NULL:
        return None
    cdef size_t buflen
    buflen = wcslen(buf)
    cdef PyObject *p = PyUnicode_FromWideChar(buf, buflen)
    return <unicode>p

I called this function in a loop like this:

cdef wchar_t* buf = <wchar_t*>calloc(100, sizeof(wchar_t))
# ... copy some wide string to buf

for n in range(30000):
    u = wc_to_pystr(buf) #<== behaves as if its a memory leak

free(buf)

I tested this on Windows and the observation is that the memory (as seen in Task Manager) keeps on increasing and hence I suspect that there could be a memory leak here.

This is surprising because:

  1. As per my understanding the API PyUnicode_FromWideChar() copies the supplied buffer.
  2. Every-time the variable 'u' is assigned a different value, the previous value should be freed-up
  3. Since the source buffer ('buf') remains as is and is released only after the loop ends, I was expecting that memory should not increase after a certain point at all

Any idea where am I going wrong? Is there a better way to implement Wide Char to python unicode object?

bitflood
  • 441
  • 5
  • 15
  • could you try adding a `del u` in the for loop and check again if the memory keeps increasing? – gg349 Dec 07 '14 at 17:49
  • @GiulioGhirardo, I tried as you said and still the memory keeps increasing. At this point I am not sure whether its a real memory leak or the python GC is a bit lazy collecting the garbage – bitflood Dec 08 '14 at 02:41

1 Answers1

3

Solved!! Solution:

(Note: The solution refers to a piece of my code which was not in the question originally. I had no clue while posting that it would hold the key to solve this. Sorry to those who gave it a thought to solve ... )

In cython pyx file, I had declared the python API like:

PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)

I checked out the docs at https://github.com/cython/cython/blob/master/Cython/Includes/cpython/init.pxd

I had declared return type as PyObject* and hence an additional ref was created which I was not deref-ing explicitly. Solution was to change the return type in the signature like:

object PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)

As per docs adding 'object' as return type does not increment any ref count and hence in the for loop memory is freed-up correctly. The modified 'wc_to_pystr' looks like this:

cdef wc_to_pystr(wchar_t *buf):
    if buf == NULL:
        return None
    cdef size_t buflen
    buflen = wcslen(buf)
    p = PyUnicode_FromWideChar(buf, buflen)
    return p
bitflood
  • 441
  • 5
  • 15