Garbage collection and custom getter in SWIG

Question

I'm not the author, but there's a public software package I use that seems to be leaking memory (Github issue). I'm trying to figure out how to patch it to make it work correctly.

To narrow the problem down, there's a struct, call it xxx_t. First %extend is used to make a member of the struct available in Python:

%extend xxx_t {
    char *surface;
}

Then there's a custom getter. What exactly it does here isn't important except that it uses new to create a char*.

%{
char* xxx_t_surface_get(xxx *n) {
  char *s = new char [n->length + 1];
  memcpy (s, n->surface, n->length);
  s[n->length] = '\0';
  return s;
}
%}

Currently the code has this line to handle garbage collection:

%newobject surface;

This does not seem to work as expected. %newobject xxx_t::surface; also doesn't work. If I replace it with %newobject xxx_t_surface_get; that doesn't work because the getter function is escaped (inside %{ ... %}).

What is the right way to tell SWIG about the char* so it gets freed?

Flexo · Accepted Answer · 2019-07-28T16:10:44.917

Before getting start it's worth pointing out one thing: Because you return char* it ends up using SWIG's normal string typemaps to produce a Python string.

Having said that let's understand what the code that currently gets generated looks like. We can start our investigation with the following SWIG interface definition to experiment with:

%module test 

%inline %{
  struct foobar {
  };
%}

%extend foobar {
  char *surface;
}

If we run something like this through SWIG we'll see a generated function which wraps your _surface_get code, something like this:

SWIGINTERN PyObject *_wrap_foobar_surface_get(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
  PyObject *resultobj = 0;
  foobar *arg1 = (foobar *) 0 ;
  void *argp1 = 0 ;
  int res1 = 0 ;
  PyObject * obj0 = 0 ;
  char *result = 0 ;

  if (!PyArg_ParseTuple(args,(char *)"O:foobar_surface_get",&obj0)) SWIG_fail;
  res1 = SWIG_ConvertPtr(obj0, &argp1,SWIGTYPE_p_foobar, 0 |  0 );
  if (!SWIG_IsOK(res1)) {
    SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "foobar_surface_get" "', argument " "1"" of type '" "foobar *""'"); 
  }
  arg1 = reinterpret_cast< foobar * >(argp1);
  result = (char *)foobar_surface_get(arg1);
  resultobj = SWIG_FromCharPtr((const char *)result);
  /* result is never used again from here onwards */
  return resultobj;
fail:
  return NULL;
}

The thing to note here however is that the result of calling your getter is lost when this wrapper returns. That is to say that it isn't even tied to the lifespan of the Python string object that gets returned.

So there are several ways we could fix this:

One option would be to ensure that the generated wrapper calls delete[] on the result of calling your getter, after the SWIG_FromCharPtr has happened. This is exactly what %newobject does in this instance. (See below).
Another alternative would be to keep the allocated buffer between calls, probably in some thread local storage and track the size to minimise allocations
Alternatively we could use some kind of RAII based object to own the temporary buffer and make sure it gets removed. (We could do something neat with operator void* if we wanted even).

If we change our interface to add %newobject like so:

%module test 

%inline %{
  struct foobar {
  };
%}

%newobject surface;

%extend foobar {
  char *surface;
}

Then we see that our generated code now looks like this:

  // ....
  result = (char *)foobar_surface_get(arg1);
  resultobj = SWIG_FromCharPtr((const char *)result);
  delete[] result;

We can see this in the real code from github too, so this isn't the bug that you're looking for.

Typically for C++ I'd lean towards the RAII option. And as it happens there's a neat way to do this from both a SWIG perspective and a C++ one: std::string. So we can fix your leak in a simple and clean way just by doing something like this:

%include <std_string.i> /* If you don't already have this... */

%extend xxx_t {
    std::string surface;
}

%{
std::string xxx_t_surface_get(xxx *n) {
  return std::string(n->surface, n->length);
}
%}

(You'll need to change the setter to match too though, unless you made it const so there is no setter)

The thing about this though is that it's still making two sets of allocations for the same output. Firstly the std::string object makes one allocation and then secondly an allocation occurs for the Python string object. That's all for something where the buffer already exists in C++ anyway. So whilst this change would be sufficient and correct to solve the leak you can also go further and write a version that does less duplicitous copying:

%extend xxx_t {
    PyObject *surface;
}

%{
PyObject *xxx_t_surface_get(xxx *n) {
  return SWIG_FromCharPtrAndSize(n->surface, n->length);
}
%}

Something else is going on here. There is already a `delete[]` in the real generated code, because of the `%newobject`. — Flexo, Jul 28 '19 at 16:02
Hm, I'd assumed this was the issue because it obviously added a `new`... I'll have to go back and look for other potential causes. Thanks for the detailed explanation in any case, it makes it clear what's going on! — polm23, Jul 29 '19 at 01:27

Garbage collection and custom getter in SWIG

1 Answers1