4

I'm working with an interesting puzzle - how to create closures of C functions that reference Python data at runtime? This is a little long, but there's some technically interesting stuff here if you like programming puzzles, so is worth a read.

Let's say we have:

def f(data): ... implementation...

then

def make_closure(f, d):  # d is a data for f
  def g(): return f(d)
  return g

g is a closure of f and d. We have captured the d parameter internally to g and now have a function we can call that uses d, without needing to provide d. This is bread an butter in Python-land and all languages that have functions as first class data-types.

Now my problem (version 1). I'm using (from Python) a C library that accepts a callback:

void register_func(callback_t g) {... }

The library doesn't allow me to pass any of my own data to that callback, and so I need to make a C callback function that is closure of its data. Happily, CFUNCTYPE does something magical for us:

from ctypes import CFUNCTYPE

def make_closure(f, d):
  @CFUNCTYPE(...) 
  def g(): return f(d)

  return g

lib = ctypes.CDLL("thirdpartylib", mode=ctypes.RTLD_GLOBAL)
g = make_closure(f, data)   # Python function f, Python data object data
lib.register_func(g) # Call the C function, provide a Python function as the callback.

Wow! It may not be obvious, but something magical happened here. At runtime we've produced a new C-style function pointer (a pointer to program text(machine code) data), that's callable by C, and it references the dynamically created Python function and data. (!) So we've dynamic created a closure of Python data that's callable by C. Magic!

Problem solved? Not so fast. My actual problem (version 2) is a little more tricky.

With a third party C library, having the API:

result_t callback(int iterstatus);   // Prototpe of function callback_t

void register_iterator(char* name, callback_t callback);

The library will execute the registered iterator more or less as follows (implemented in C):

// Initialize the iterator's state (iterstatus==0)
context = callback(0, NULL)

// Process each result of the iterator (iterstatus==1)
while (true){
  result = callback(1, context);
  if (result) yield_data(result);
}

// Cleanup the iterator's state (iterstatus==2)
// free(context->user_data)
callback(2, context); 

(I know -- we don't get the APIs we want, just the ones we have).

So, I can register the callback using a CFUNCTYPE object provided by as the pointer to callback register_iterator(name, g). However, Python is grossly slow for doing the iteration part. Really, I need Python just to create the iteration data (a numpy array) and have a C-routine iterate over it. Like this:

/// Data that callback_closured need in its implementation
/// We be assigned a CFUNCTYPE value and point to a Python function
static py_iterator_allocator_t closure_alloc_func = NULL;

void* callback_closured(int iterstatus, iterstate_t* state){
   switch (iterstatus){
   case 0:
        assert(state==NULL);
        state = closure_alloc_func();   // Ask python to allocate the state
        return state;
    case 1:
        if(state->next==state->end) return NULL;  // no more data
        state->next++;
        return (state->next)-1;
    case 3:                           // Cleanup
        py_decref(state);             
    default:
        return NULL;
    } 
}

void my_register_iterator(char* name, callback_t callback, py_iterator_allocator_t alloc_func){
   closure_alloc_func = alloc_func;
   register_iterator(name, my_register_iterator);
}

... so that in Python we call:


@CFUNCTYPE(...) 
def alloc_iterator():  
   ...
   return pyobject;

lib.my_register_iterator(name, alloc_iterator)

... which works, but sucks is many ways (we need new global variables and a new my_register_iterator function and closure_alloc_func for it to reference from C for each function that we want to register -- I don't really want to edit the C source for each new closure).

So the question:

  • Can I dynamically create a callback_closured function pointer for any python allocator at runtime from within Python? :
    // Python-land
    
    @CFUNCTYPE(...) 
    def alloc_iterator():  
       ...
       return pyobject;

    # lib.iter_impl has the prototype
    # void* iter_impl(int, iterstate_t*, alloc_funct_t )
    iter_impl2 = make_iterator_impl(lib.iter_impl, alloc_func);

    # iter_impl2 references alloc_func, somehow... and has the calling prototype and references 
    #void* (int, iterstate_t* state) is the new C prototype
    register_iterator(name, iter_impl2);

and calling iter_impl2(int s, void* state), is roughly equivalent to

return iter_impl(s, state, alloc_func)

... a dynamically created closure, but in C.

The benefit is that allocation can be implemented in Python (flexible behvaior), but iteration implemented by C (fast):

// C-land
void* iter_impl(int iterstatus, iterstate_t* state, alloc_funct_t alloc_func){
   switch (iterstatus){
   case 0:
        assert(state==NULL);
        state = alloc_func();   // Ask python to allocate the state
        return state;
    case 1:
        if(state->next==state->end) return NULL;  // no more data
        state->next++;
        return (state->next)-1;
    case 3:                           // Cleanup
        py_decref(state);             
    default:
        return NULL;
    } 
}

This doesn't seem trivial. But since ctypes seem to be able to generate C-callable function pointers that are closures of Python data, but it doesn't seem impossible. Ctypes seems to invent new function pointers to the program text memory. The only way I know to make a function pointer in C is either: i) declare and compile new functions with new names, or ii) load a shared library. So, perhaps there'a way to bend ctypes (or its method) to do this?

I looked at CFFI. It provided a way to compile and link C code at runtime, but doesn't obviously provide a way to capture a pointer to Python data that can be referenced from that code.

user48956
  • 14,850
  • 19
  • 93
  • 154
  • You should probably ask one or two precise questions. If it is "how would I do the same with cffi?" then I can give a go at answering. – Armin Rigo Apr 27 '20 at 06:50
  • I'm mostly just curious about ctypes - am curious how it works. I'd be interested in any solutions that creates a C-function pointer from a C-function pointer and makes a closure over a Python function function. I'll update the question. – user48956 Apr 27 '20 at 14:18
  • I don't know if that helps. Fast callbacks are also relevant on eg. scipy.integrate https://stackoverflow.com/a/60619681/4045774 Accesing a gobal numpy-array from Numba compiled code is shown here (the pointer to the array is hardcoded during the compilation process) https://stackoverflow.com/a/61550054/4045774 – max9111 May 04 '20 at 18:41

0 Answers0