0

While learning Python C extension modules utilizing CPython's C API, I have encountered a curious segfault bug (disclaimer: I have only a passing fluency of C). A typical example module method, written in C (which can then be imported into python) might look like this:

static PyObject *method_echo(PyObject *self, PyObject *args) {
    long a;
    if(!PyArg_ParseTuple(args, "l", &a)) {
        return NULL;
    }
    printf("Value of the passed variable is: %li\n", a);
    return PyLong_FromLong(a);
}

This works for me without issue. The problem comes if I choose to declare a as a pointer and pass it to PyArg_ParseTuple, for example, changing the relevant lines to:

    long *a;
    if(!PyArg_ParseTuple(args, "l", a)) {
        return NULL;
    }

(and of course modifying the remaining lines to work with a pointer), this results in a segfault. HOWEVER, if I remove the return NULL line:

    long *a;
    PyArg_ParseTuple(args, "l", a);

This runs without issue. Even though the return NULL statement never gets executed (I have checked that explicitly with a printf in the conditional block), somehow it causes a segfault if I pass a pointer to PyArg_ParseTuple. Any ideas what's going on?

Here are some details of my system, followed by some example code that should be able to reproduce the problem:

macOS 11.6 python3.9 C compiler: clang (clang-1300.0.29.30)

C extension module (which will import in python as test1_pptr):

test1_parsepointer.c

#define PY_SSIZE_T_CLEAN
#include <python3.9/Python.h>

static PyObject *method_parse_ptr1(PyObject *self, PyObject *args) {
    long *a;
    if(!PyArg_ParseTuple(args, "l",a)) {
        printf("PROBLEM ENCOUNTERED\n");
    };
    printf("  ptr-v1: Value of var is: %li\n", *a);
    return PyLong_FromLong(*a);
}

static PyObject *method_parse_ptr2(PyObject *self, PyObject *args) {
    long *a;
    if(!PyArg_ParseTuple(args, "l",a)) {
        return NULL;
    };
    printf("  ptr-v2: Value of var is: %li\n", *a);
    return PyLong_FromLong(*a);
}

static PyObject *method_parse_val(PyObject *self, PyObject *args) {
    long a;
    if(!PyArg_ParseTuple(args, "l",&a)) {
        return NULL;
    };
    printf("     val: Value of var is: %li\n", a);
    return PyLong_FromLong(a);
    
}

static PyMethodDef parseptr_methods[] = {
    {"parse_ptr_v1", method_parse_ptr1, METH_VARARGS, "Parse as pointer, no NULL"},
    {"parse_ptr_v2", method_parse_ptr2, METH_VARARGS, "Parse as pointer, with NULL"},
    {"parse_val", method_parse_val, METH_VARARGS, "Parse as val, with NULL"},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef parsing_ptrs = {
    PyModuleDef_HEAD_INIT,
    "test1_pptr",
    "Testing PyArg_ParseTuple vars as pointers",
    -1,
    parseptr_methods
};

PyMODINIT_FUNC PyInit_test1_pptr(void) {
    return PyModule_Create(&parsing_ptrs);
}

I compile this with the following command:

clang -shared -undefined dynamic_lookup -o test1_parsepointer.so test1_parsepointer.c

Create a .py file that bootstraps this module upon import:

test1_pptr.py:

def __bootstrap__():
    global __bootstrap__, __loader__, __file__
    import sys, pkg_resources, importlib.util
    __file__ = pkg_resources.resource_filename(__name__, 'test1_parsepointer.so')
    __loader__ = None; del __bootstrap__, __loader__
    spec = importlib.util.spec_from_file_location(__name__,__file__)
    mod = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(mod)
__bootstrap__()

And finally, the methods can be tested with the following python script:

import test1_pptr as tppr

"""
Three functions in tppr should be:
    parse_ptr_v1(int)
    parse_ptr_v2(int)
    parse_val
"""

def main():
    a = int(3)
    print("about to test parse-by-value...")
    tppr.parse_val(a) # runs fine

    print("about to test parse-by-pointer v1...")
    tppr.parse_ptr_v1(a) # runs fine
    
    print("about to test parse-by-pointer v2...")
    tppr.parse_ptr_v2(a) # segfaults

if __name__ == "__main__":
    main()
Aaron
  • 1
  • 1

1 Answers1

0
long *a;

This doesn't point to anything valid because you haven't initialized it (either by allocating memory for a long or taking the address of an existing long).

if(!PyArg_ParseTuple(args, "l", a))

This is attempting to write into whatever a points to. But a doesn't point to a valid long. Therefore it crashes.

The fact that it seems to work in some cases is completely uninteresting. Writing into an invalid pointer is undefined behaviour. Practically it's just arbitrary what a gets initialized to point at. There's no value in attempting to understand it.

DavidW
  • 29,336
  • 6
  • 55
  • 86
  • Thanks for the clear explanation! I assumed that `long *a;` would have allocated a `long`-sized spot in memory to which `a` points, but I suppose all it does is allocate memory to hold just the pointer. I agree, it doesn't make sense to wonder why it sometimes works if the code was flawed to begin with. – Aaron Dec 23 '22 at 22:40