I'm working with an interesting puzzle - how to create closures of C functions that reference Python data at runtime? This is a little long, but there's some technically interesting stuff here if you like programming puzzles, so is worth a read.
Let's say we have:
def f(data): ... implementation...
then
def make_closure(f, d): # d is a data for f
def g(): return f(d)
return g
g is a closure of f
and d
. We have captured
the d
parameter internally to g and now have a function we can call that uses d
, without needing to provide d
. This is bread an butter in Python-land and all languages that have functions as first class data-types.
Now my problem (version 1). I'm using (from Python) a C library that accepts a callback:
void register_func(callback_t g) {... }
The library doesn't allow me to pass any of my own data to that callback, and so I need to make a C callback function that is closure of its data. Happily, CFUNCTYPE
does something magical for us:
from ctypes import CFUNCTYPE
def make_closure(f, d):
@CFUNCTYPE(...)
def g(): return f(d)
return g
lib = ctypes.CDLL("thirdpartylib", mode=ctypes.RTLD_GLOBAL)
g = make_closure(f, data) # Python function f, Python data object data
lib.register_func(g) # Call the C function, provide a Python function as the callback.
Wow! It may not be obvious, but something magical happened here. At runtime we've produced a new C-style function pointer (a pointer to program text(machine code) data), that's callable by C, and it references the dynamically created Python function and data. (!) So we've dynamic created a closure of Python data that's callable by C. Magic!
Problem solved? Not so fast. My actual problem (version 2) is a little more tricky.
With a third party C library, having the API:
result_t callback(int iterstatus); // Prototpe of function callback_t
void register_iterator(char* name, callback_t callback);
The library will execute the registered iterator more or less as follows (implemented in C):
// Initialize the iterator's state (iterstatus==0)
context = callback(0, NULL)
// Process each result of the iterator (iterstatus==1)
while (true){
result = callback(1, context);
if (result) yield_data(result);
}
// Cleanup the iterator's state (iterstatus==2)
// free(context->user_data)
callback(2, context);
(I know -- we don't get the APIs we want, just the ones we have).
So, I can register the callback using a CFUNCTYPE object provided by as the pointer to callback register_iterator(name, g)
. However, Python is grossly slow for doing the iteration part. Really, I need Python just to create the iteration data (a numpy array) and have a C-routine iterate over it. Like this:
/// Data that callback_closured need in its implementation
/// We be assigned a CFUNCTYPE value and point to a Python function
static py_iterator_allocator_t closure_alloc_func = NULL;
void* callback_closured(int iterstatus, iterstate_t* state){
switch (iterstatus){
case 0:
assert(state==NULL);
state = closure_alloc_func(); // Ask python to allocate the state
return state;
case 1:
if(state->next==state->end) return NULL; // no more data
state->next++;
return (state->next)-1;
case 3: // Cleanup
py_decref(state);
default:
return NULL;
}
}
void my_register_iterator(char* name, callback_t callback, py_iterator_allocator_t alloc_func){
closure_alloc_func = alloc_func;
register_iterator(name, my_register_iterator);
}
... so that in Python we call:
@CFUNCTYPE(...)
def alloc_iterator():
...
return pyobject;
lib.my_register_iterator(name, alloc_iterator)
... which works, but sucks is many ways (we need new global variables and a new my_register_iterator
function and closure_alloc_func
for it to reference from C for each function that we want to register -- I don't really want to edit the C source for each new closure).
So the question:
- Can I dynamically create a
callback_closured
function pointer for any python allocator at runtime from within Python? :
// Python-land
@CFUNCTYPE(...)
def alloc_iterator():
...
return pyobject;
# lib.iter_impl has the prototype
# void* iter_impl(int, iterstate_t*, alloc_funct_t )
iter_impl2 = make_iterator_impl(lib.iter_impl, alloc_func);
# iter_impl2 references alloc_func, somehow... and has the calling prototype and references
#void* (int, iterstate_t* state) is the new C prototype
register_iterator(name, iter_impl2);
and calling iter_impl2(int s, void* state)
, is roughly equivalent to
return iter_impl(s, state, alloc_func)
... a dynamically created closure, but in C.
The benefit is that allocation can be implemented in Python (flexible behvaior), but iteration implemented by C (fast):
// C-land
void* iter_impl(int iterstatus, iterstate_t* state, alloc_funct_t alloc_func){
switch (iterstatus){
case 0:
assert(state==NULL);
state = alloc_func(); // Ask python to allocate the state
return state;
case 1:
if(state->next==state->end) return NULL; // no more data
state->next++;
return (state->next)-1;
case 3: // Cleanup
py_decref(state);
default:
return NULL;
}
}
This doesn't seem trivial. But since ctypes seem to be able to generate C-callable function pointers that are closures of Python data, but it doesn't seem impossible. Ctypes seems to invent new function pointers to the program text memory. The only way I know to make a function pointer in C is either: i) declare and compile new functions with new names, or ii) load a shared library. So, perhaps there'a way to bend ctypes (or its method) to do this?
I looked at CFFI. It provided a way to compile and link C code at runtime, but doesn't obviously provide a way to capture a pointer to Python data that can be referenced from that code.