2

My PyObject becomes a Bytes object while getting it's value

So lately, I am doing a project in C, where I implement a few type of trees, to be able to use them in python, since a C btree implementation is way faster than a Python one. (Ikr there are a few available libraries, but since I have more free time during the lockdown, I thought, I could do my own library.)

Everything is working fine, untill I would like to find an element, and print its value, all at the same line. When I do that, my Node object turns into a Bytes object, without any further assignment.

Further Information

OS: Ubuntu 16.04.
Python version: Python 3.5.2
GCC: 5.4.0-6 Ubuntu

Python Code:

import mylib
import random

maxv = 10

def addValues(tree, values):
    for value in values:
        tree.insert(value)

def main():
    my_list = [i for i in range(maxv)]
    #random.shuffle(my_list)
    tree    = mylib.BinaryTree('test')
    addValues(tree, my_list)
    print(tree.find(3).getValue())
    print(tree.sort(False))
    
main()

Expected Output (Works if line before the last in main func is print(tree.find(3))):

3
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Output I got with the above test code:

3
Segmentation fault

The Segmentation fault occurs, because the node contains the value 3 became a Bytes object while printing its value. The below code will print the new Node.

import mylib
import random

maxv = 10

def addValues(tree, values):
    for value in values:
        tree.insert(value)

def main():
    my_list = [i for i in range(maxv)]
    #random.shuffle(my_list)
    tree    = mylib.BinaryTree('test')
    addValues(tree, my_list)
    print(tree.find(0).getValue()) #0 is going to be the root node's value, since random.shuffle is not happening.
    print(tree.root)
    print(tree.sort(False))
    
main()

Output:

0
b'0'
Segmentation fault

I spent days debugging this, and since I am definetly not a master in C programming, (You will see below), I could not find the error. I would like to move on, and implement more features, but I was unable to find the error. There is a chance I missed something so trivial, or a thing I am not aware of. I am not an experienced C programmer sadly. :/

My module contains much more code, I will not post what is not necessary for this problem. If you think you should see more to understand my code, feel free to tell me! Also I hope my code is well-readable!

Can someone explain what is happening exactly?
Thank you!

C code related here:

Recursive find function:

/** find node by value */
/*static*/ PyObject *find(Node *root, int value) {
    if((PyObject *) root == Py_None) {
        return NULL;
    }
    if(root->value == value) {
        return (PyObject *) root;
    }
    if(value < root->value) {
        return find((Node *) root->left,  value);
    }
    if(value > root->value) {
        return find((Node *) root->right, value);
    }
}

Controller, this method is being called from Python, and this will call the above recursive function:

/** Find node by value */
/*static*/ PyObject *BinaryTree_Find(BinaryTree *self, PyObject *args, PyObject *kwds) {
    int value;
    PyObject *result;
    
    if(!PyArg_ParseTuple(args, "i", &value)) {
        return NULL;
    }
    
    result = find((Node *) self->root, value);
    if(result == NULL) {
        result = Py_None;
    }
    
    return (PyObject *) result;
}

Minimal, Reproducible Example

You can use the Python code above to drive this module, and the below code to create the library.

from distutils.core import setup, Extension
setup(
    name = 'mylib', version = '1.0',  \
    ext_modules = [Extension('mylib',
        [
            'custom.c',
        ])])

And run by: sudo python3 build.py install

#include <Python.h>
#include "structmember.h"

/* Header Block */
#define MODULE_NAME "mylib"
#define MODULE_DOC  "Custom Binary Tree Implementations in C for Python"

/** Node */
typedef struct {
    PyObject_HEAD
    int value;
    PyObject *left;
    PyObject *right;
} Node;

/** BinaryTree */
typedef struct {
    PyObject_HEAD
    int elem;
    PyObject *name;
    PyObject *root;
} BinaryTree;

/** NODE */

/** creating node, set inital values */
/*static*/ PyObject *newNode(PyTypeObject *type, PyObject *args, PyObject *kwds) {
    Node *self;
    
    self = (Node *) type->tp_alloc(type, 0);
    if(self == NULL) {
        return NULL;
    }
    
    self->value = 0;
    self->left  = NULL;
    self->right = NULL;
    
    return (PyObject *) self;
}

/** init function of node, set parameter as value */
/*static*/ int Node_Init(Node *self, PyObject *args, PyObject *kwds) {
    int value;
    
    if(!PyArg_ParseTuple(args, "i", &value)) {
        return -1;
    }
    
    self->value = value;
    self->left  = Py_None;
    self->right = Py_None;
    
    return 0;
}

/** deallocation of node */
/*static*/ void Node_Dealloc(Node *self) {
    if(self->left  != Py_None) Py_DECREF(self->left);
    if(self->right != Py_None) Py_DECREF(self->right);
    Py_TYPE(self)->tp_free((PyObject *) self);
}

/** return value of node */
/*static*/ PyObject *Node_GetValue(Node *self, PyObject *args, PyObject *kwds) {
    return Py_BuildValue("i", self->value);
}

/** node variables */
/*static*/ PyMemberDef Node_Members[] = {
    {
        "left",
        T_OBJECT_EX,
        offsetof(Node, left),
        0,
        "Left Child"
    },
    {
        "right",
        T_OBJECT_EX,
        offsetof(Node, right),
        0,
        "Right Child"
    }
};

/** node methods */
/*static*/ PyMethodDef Node_Methods[] = {
    {
        "getValue",
        (PyCFunction) Node_GetValue,
        METH_NOARGS,
        "Get value of node"
    },
    {NULL}
};

/** node type def */
/*static*/ PyTypeObject Node_Type = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name      = "BinaryTree Node",
    .tp_doc       = "Custom Binary Tree Node Object",
    .tp_basicsize = sizeof(Node),
    .tp_itemsize  = 0,
    .tp_flags     = Py_TPFLAGS_DEFAULT,
    .tp_new       = newNode,
    .tp_init      = (initproc) Node_Init,
    .tp_dealloc   = (destructor) Node_Dealloc,
    .tp_members   = Node_Members,
    .tp_methods   = Node_Methods,
};

/** CONTROLLER */

/** insert new node into the tree */
/*static*/ PyObject *insert(Node *root, int value, PyObject *args) {
    if((PyObject *) root == Py_None) {
        return PyObject_CallObject((PyObject *) &Node_Type, args);
    }
    if(root->value == value) {
        return NULL;
    }
    if(value < root->value) {
        root->left  = insert((Node *) root->left, value, args);
    }
    if(value > root->value) {
        root->right = insert((Node *) root->right, value, args);
    }
    return (PyObject *) root;
}

/** find node by value */
/*static*/ PyObject *find(Node *root, int value) {
    if((PyObject *) root == Py_None) {
        return NULL;
    }
    if(root->value == value) {
        return (PyObject *) root;
    }
    if(value < root->value) {
        return find((Node *) root->left,  value);
    }
    if(value > root->value) {
        return find((Node *) root->right, value);
    }
}

/** sort tree asc / inorder traversal */
/*static*/ void sortAsc(Node *root, PyObject *list) {
    if((PyObject *) root == Py_None) {
        return;
    }
    sortAsc((Node *) root->left,  list);
    PyList_Append(list, Py_BuildValue("i", root->value));
    sortAsc((Node *) root->right, list);
}

/** sort tree desc */
/*static*/ void sortDesc(Node *root, PyObject *list) {
    if((PyObject *) root == Py_None) {
        return;
    }
    sortDesc((Node *) root->right,  list);
    PyList_Append(list, Py_BuildValue("i", root->value));
    sortDesc((Node *) root->left, list);
}

/** BinaryTree */

/** creating binary tree, set inital values */
/*static*/ PyObject *newBinaryTree(PyTypeObject *type, PyObject *args, PyObject *kwds) {
    BinaryTree *self;
    
    self = (BinaryTree *) type->tp_alloc(type, 0);
    if(self == NULL) {
        return NULL;
    }
    
    self->name = PyUnicode_FromString("");
    if(self->name == NULL) {
        Py_DECREF(self);
        return NULL;
    }
    
    self->elem = 0;
    self->root = Py_None;
    
    return (PyObject *) self;
}

/** set parameters as variables */
/*static*/ int BinaryTree_Init(BinaryTree *self, PyObject *args, PyObject *kwds) {
    PyObject *name, *temp;
    
    if(!PyArg_ParseTuple(args, "O", &name)) {
        return -1;
    }
    
    if(name) {
        temp = self->name;
        Py_INCREF(name);
        self->name = name;
        Py_XDECREF(temp);
    }
    
    self->elem = 0;
    
    return 0;
}

/** deallocation of binary tree */
/*static*/ void BinaryTree_Dealloc(BinaryTree *self) {
    Py_XDECREF(self->name);
    Py_XDECREF(self->root);
    Py_TYPE(self)->tp_free((PyObject *) self);
}

/** insert controller, set parameter as value, drive inserting, return true */
/*static*/ PyObject *BinaryTree_Insert(BinaryTree *self, PyObject *args, PyObject *kwds) {
    int value;
    PyObject *result;
    
    if(!PyArg_ParseTuple(args, "i", &value)) {
        return NULL;
    }
    
    result = insert((Node *) self->root, value, args);
    if(result == NULL) {
        Py_XDECREF(result);
        PyErr_SetString(PyExc_TypeError, "Already existing value!");
        return NULL;
    }
    
    self->root = result;
    self->elem++;
    
    Py_RETURN_TRUE;
}

/** Find node by value */
/*static*/ PyObject *BinaryTree_Find(BinaryTree *self, PyObject *args, PyObject *kwds) {
    int value;
    PyObject *result;
    
    if(!PyArg_ParseTuple(args, "i", &value)) {
        return NULL;
    }
    
    result = find((Node *) self->root, value);
    if(result == NULL) {
        result = Py_None;
    }
    
    return (PyObject *) result;
}

/** sort binary tree asc */
/*static*/ PyObject *BinaryTree_Sort(BinaryTree *self, PyObject *args, PyObject *kwds) {
    int rev;
    PyObject *list = PyList_New(0);
    
    if(!PyArg_ParseTuple(args, "p", &rev)) {
        Py_XDECREF(list);
        return NULL;
    }
    
    switch(rev) {
        case 0:
            sortAsc( (Node *) self->root, list);
            break;
        case 1:
            sortDesc((Node *) self->root, list);
            break;
        default:
            sortAsc( (Node *) self->root, list);
    }
    
    return list;
}

/** binary tree variables */
/*static*/ PyMemberDef BinaryTree_Members[] = {
    {
        "name",
        T_OBJECT_EX,
        offsetof(BinaryTree, name),
        0,
        "BinaryTree's unique name"
    },
    {
        "root",
        T_OBJECT_EX,
        offsetof(BinaryTree, root),
        0,
        "BinaryTree's root"
    },
    {NULL}
};

/** binary tree methods */
/*static*/ PyMethodDef BinaryTree_Methods[] = {
    {
        "insert",
        (PyCFunction) BinaryTree_Insert,
        METH_VARARGS,
        "Insert new node."
    },
    {
        "find",
        (PyCFunction) BinaryTree_Find,
        METH_VARARGS,
        "Find node by value."
    },
    {
        "sort",
        (PyCFunction) BinaryTree_Sort,
        METH_VARARGS,
        "Sort tree."
    },
    {NULL}
};

/** binary tree type def */
/*static*/ PyTypeObject BinaryTree_Type = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name      = "BinaryTree",
    .tp_doc       = "Custom Binary Tree Object",
    .tp_basicsize = sizeof(BinaryTree),
    .tp_itemsize  = 0,
    .tp_flags     = Py_TPFLAGS_DEFAULT,
    .tp_new       = newBinaryTree,
    .tp_init      = (initproc) BinaryTree_Init,
    .tp_dealloc   = (destructor) BinaryTree_Dealloc,
    .tp_members   = BinaryTree_Members,
    .tp_methods   = BinaryTree_Methods,
};

/** MODULE */

/** Module def */
static struct PyModuleDef mylib_module = {
    PyModuleDef_HEAD_INIT,
    MODULE_NAME, /* name of module */
    MODULE_DOC,  /* module documentation, may be NULL */
    -1,          /* size of per-interpreter state of the module */
};

/** Module init */
PyMODINIT_FUNC PyInit_mylib(void) {
    PyObject *mod;
    
    if(PyType_Ready(&BinaryTree_Type) < 0) {
        return NULL;
    }
    
    if(PyType_Ready(&Node_Type) < 0) {
        return NULL;
    }
    
    mod = PyModule_Create(&mylib_module);
    if(mod == NULL) {
        return NULL;
    }
    
    Py_INCREF(&BinaryTree_Type);
    if(PyModule_AddObject(mod, "BinaryTree", (PyObject *) &BinaryTree_Type) < 0) {
        Py_DECREF(&BinaryTree_Type);
        Py_DECREF(mod);
        return NULL;
    }
    
    Py_INCREF(&Node_Type);
    if(PyModule_AddObject(mod, "Node", (PyObject *) &Node_Type) < 0) {
        Py_DECREF(&Node_Type);
        Py_DECREF(mod);
        return NULL;
    }
    
    return mod;
}

EDIT

print(tree.root.getValue())
print(tree.find(tree.root.value).getValue())

Works fine! The error lies in the find function!

Thanks to DavidW, he pointed out it must be a reference error somewhere, so I counted, how many references are there refering to that Node object before, and after printing it's value. The results:

2                  # references before
56                 # value before, this line accessed the Node.
b'56\n'            # value after 
139690028005977    # references after
Segmentation fault # segfault while sorting

So while using Find, the Node's reference count gets decreased, therefore all Node objects are deallocated/deleted. Increasing before returning, and custom setter/getter methods with INCREF/DECREF-s are also ineffective.

DONE

Following DavidW's suggestion, I managed to find the answer! I did not increased the reference count, therefore the Nodes where deallocated right after searching. The objects were not behaving the same way when I did not requested the Node's value, or when I did not print the value out.I suggest the print function decreased the reference count somehow.

Right find controller:

/** Find nood by value */
/*static*/ PyObject *BinaryTree_Find(BinaryTree *self, PyObject *args, PyObject *kwds) {
    int value;
    PyObject *result;
    
    if(!PyArg_ParseTuple(args, "i", &value)) {
        return NULL;
    }
    
    result = find((Node *) self->root, value);
    if(result == NULL) {
        result = Py_None;
    }
    
    Py_INCREF(result); // Increasing the Node object's reference count here
    
    return (PyObject *) result;
}
nagyl
  • 1,644
  • 1
  • 7
  • 18
  • 1
    There's definitely reference counting errors - a lot of the time when you return values they should be `INCREF`ed (e.g. in `BinaryTree_Find`) since Python treats them as a new reference. I know that C API modules are hard to make really small, but this is a long way from minimal, so it'd be a lot of work for someone to go through it thoroughly. – DavidW Jul 20 '20 at 16:03
  • Sorry, if I would make my code smaller, the module won't work anymore. It contains only necessary functions, the ones used by Python to insert, search, and print the tree, also you need to create/destruct the objects. Thanks for the advice, I am going to try that! – nagyl Jul 20 '20 at 16:33
  • 1
    What I'd do next is put `printf`'s in the destructors (and maybe constructors too) of your various classes - see where things are being deallocated early – DavidW Jul 20 '20 at 17:44
  • @DavidW please make a short answer, including both of your comments, so I can accept it as an answer! Thank you so much for helping!! – nagyl Jul 20 '20 at 22:17
  • 1
    Glad you got it sorted. Since you have a working version of your code you're welcome to post it as an answer yourself. There's no real reason why it needs to be me who writes it. – DavidW Jul 20 '20 at 22:28
  • Since I could not find out the problem without your help, I would like to accept your comment as an answer. I did not know about how reference count works. – nagyl Jul 20 '20 at 22:48

1 Answers1

1

As discussed in the comments it was a reference counting error. A C API function must return what Python calls a "new reference". That means either returning something that has been purposefully created inside the function (for example the result of PyList_New) in incrementing the reference count of an existing object.

In BinaryTree_Find specifically you were not returning a new reference. As a result Python would end up freeing an object that still formed part of your BinaryTree. Once that's happened you can get all kinds of strange and confusing behaviour. I suggested adding Py_INCREF(result).

To help diagnose this kind of problem it was be worth adding printf statements to object constructors and destructors (just as a temporary debugging measure) so you can check your assumptions about when they are allocated and freed.

DavidW
  • 29,336
  • 6
  • 55
  • 86