1

I'm using python's C API (2.7) in C++ to convert a python tree structure into a C++ tree. The code goes as follows:

  • the python tree is implemented recursively as a class with a list of children. the leaf nodes are just primitive integers (not class instances)

  • I load a module and invoke a python method from C++, using code from here, which returns an instance of the tree, python_tree, as a PyObject in C++.

  • recursively traverse the obtained PyObject. To obtain the list of children, I do this:

    PyObject* attr = PyString_FromString("children");
    PyObject* list = PyObject_GetAttr(python_tree,attr);
    for (int i=0; i<PyList_Size(list); i++) {
        PyObject* child = PyList_GetItem(list,i); 
        ...
    

Pretty straightforward, and it works, until I eventually hit a segmentation fault, at the call to PyObject_GetAttr (Objects/object.c:1193, but I can't see the API code). It seems to happen on the visit to the last leaf node of the tree.

I'm having a hard time determining the problem. Are there any special considerations for doing recursion with the C API? I'm not sure if I need to be using Py_INCREF/Py_DECREF, or using these functions or something. I don't fully understand how the API works to be honest. Any help is much appreciated!

EDIT: Some minimal code:

void VisitTree(PyObject* py_tree) throw (Python_exception)
{
    PyObject* attr = PyString_FromString("children");
    if (PyObject_HasAttr(py_tree, attr)) // segfault on last visit
    {
        PyObject* list = PyObject_GetAttr(py_tree,attr);
        if (list)
        {
            int size = PyList_Size(list);
            for (int i=0; i<size; i++)
            {
                PyObject* py_child = PyList_GetItem(list,i);
                PyObject *cls = PyString_FromString("ExpressionTree");
                // check if child is class instance or number (terminal)
                if (PyInt_Check(py_child) || PyLong_Check(py_child) || PyString_Check(py_child)) 
                    ;// terminal - do nothing for now
                else if (PyObject_IsInstance(py_child, cls))
                    VisitTree(py_child);
                else
                    throw Python_exception("unrecognized object from python");
            }
        }
    }
}
rmp251
  • 5,018
  • 4
  • 34
  • 46
  • 1
    And you checked so you have no `NULL` pointers? – Some programmer dude Nov 12 '12 at 06:12
  • 1
    Could you show more code? Maybe you are not properly checking for `NULL`s or for leaf nodes. `PyObject_GetAttr` returns a new instance so you don't have to `Py_INCREF` it, but be sure to check if the value returned is `NULL`(which indicates failure). `PyList_GetItem` returns a *borrowed* reference, so you should *not* `Py_DECREF` it. If you have to store its result for future use you must take ownership of the reference with a `Py_INCREF`(but I do not think you have to do this in your case). – Bakuriu Nov 12 '12 at 06:53
  • I've added some code above. I've stepped through in debug mode and made sure there are no null pointers. Can't debug inside the API. I'm pretty stuck. – rmp251 Nov 12 '12 at 19:01

1 Answers1

6

One can identify several problems with your Python/C code:

  • PyObject_IsInstance takes a class, not a string, as its second argument.

  • There is no code dedicated to reference counting. New references, such as those returned by PyObject_GetAttr are never released, and borrowed references obtained with PyList_GetItem are never acquired before use. Mixing C++ exceptions with otherwise pure Python/C aggravates the issue, making it even harder to implement correct reference counting.

  • Important error checks are missing. PyString_FromString can fail when there is insufficient memory; PyList_GetItem can fail if the list shrinks in the meantime; PyObject_GetAttr can fail in some circumstances even after PyObject_HasAttr succeeds.

Here is a rewritten (but untested) version of the code, featuring the following changes:

  • The utility function GetExpressionTreeClass obtains the ExpressionTree class from the module that defines it. (Fill in the correct module name for my_module.)

  • Guard is a RAII-style guard class that releases the Python object when leaving the scope. This small and simple class makes reference counting exception-safe, and its constructor handles NULL objects itself. boost::python defines layers of functionality in this style, and I recommend to take a look at it.

  • All Python_exception throws are now accompanied by setting the Python exception info. The catcher of Python_exception can therefore use PyErr_PrintExc or PyErr_Fetch to print the exception or otherwise find out what went wrong.

The code:

class Guard {
  PyObject *obj;
public:
  Guard(PyObject *obj_): obj(obj_) {
    if (!obj)
      throw Python_exception("NULL object");
  }
  ~Guard() {
    Py_DECREF(obj);
  }
};

PyObject *GetExpressionTreeClass()
{
  PyObject *module = PyImport_ImportModule("my_module");
  Guard module_guard(module);
  return PyObject_GetAttrString(module, "ExpressionTree");
}

void VisitTree(PyObject* py_tree) throw (Python_exception)
{
  PyObject *cls = GetExpressionTreeClass();
  Guard cls_guard(cls);

  PyObject* list = PyObject_GetAttrString(py_tree, "children");
  if (!list && PyErr_ExceptionMatches(PyExc_AttributeError)) {
    PyErr_Clear();  // hasattr does this exact check
    return;
  }
  Guard list_guard(list);

  Py_ssize_t size = PyList_Size(list);
  for (Py_ssize_t i = 0; i < size; i++) {
    PyObject* child = PyList_GetItem(list, i);
    Py_XINCREF(child);
    Guard child_guard(child);

    // check if child is class instance or number (terminal)
    if (PyInt_Check(child) || PyLong_Check(child) || PyString_Check(child)) 
      ; // terminal - do nothing for now
    else if (PyObject_IsInstance(child, cls))
      VisitTree(child);
    else {
      PyErr_Format(PyExc_TypeError, "unrecognized %s object", Py_TYPE(child)->tp_name);
      throw Python_exception("unrecognized object from python");
    }
  }
}
user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • Thanks, I really appreciate the detail. I've applied your changes and now I'm getting a segfault at `PyObject_IsInstance(child, cls)`. Is there a way to debug python's C API code? Is the source available? Thanks again. – rmp251 Nov 12 '12 at 22:20
  • I've noticed that if I run it on a slightly different tree, it doesn't even make it past the function call to obtain the tree from python (doesn't even reach the code we're talking about here). A segfault happens in `PyObject_Call`. Makes me wonder if my python-dev installation is messed up or something. I seem to have both 2.6 and 2.7 installed. – rmp251 Nov 12 '12 at 22:33
  • The source to the Python/C API is the source to Python itself. If you are running on Unix, you can unpack it and compile it with `./configure` and `make`. – user4815162342 Nov 12 '12 at 22:37
  • 1
    Maybe you should get rid of the `py_embed` library, and just directly use the Python/C API? It's not *that* hard. – user4815162342 Nov 12 '12 at 22:39
  • Yeah, I just wanted some starter code. I found more success with [this tutorial](http://www.codeproject.com/Articles/11805/Embedding-Python-in-C-C-Part-I). – rmp251 Nov 14 '12 at 01:55