0

I'm trying to call the SFMT Mersenne Twister implementation (found at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/) from Python. I'm doing this because I'd like to be able to quickly sample from a discrete pdf with 4 probabilities. I'm writing some simulations and this is by far the slowest part of my code.

I've managed to write some C code which works, and samples the input PDF using a random number in [0,1] created with the SFMT algorithm.

However I don't know how to initialise the SFMT random number generator properly when calling from Python. I only want to initialise it once of course, and then I need to pass the address of the struct used to intialise it (sfmt) into my calls to sfmt_genrand_real1.

So some example code would be:

// Create a struct which stores the Mersenne Twister's state
sfmt_t sfmt;

// Initialise the MT with a random seed from the system time
// NOTE: Only want to do this once
int seed = time(NULL);
sfmt_init_gen_rand(&sfmt, seed);

// Get a random number
double random_number = sfmt_genrand_real1(&sfmt);

The problem is that I don't know how to only seed the SFMT random number generator once when calling this code from Python. If I were just writing everything in C, I'd do the intialisation in the main() function, then pass the &sfmt argument to all subsequent sfmt_genrand_real1() calls.

But because I'm compiling this C code, then calling it from Python, I can't initialise that variable once. For now, I've stuck the initialisation right before the sfmt_genrand_real1() call, because it's the only way I could get the code to even compile and be able to access the sfmt variable in the call to the random number generator. All my attempts to somehow make the sfmt variable "global" backfired.

So my question is: is there a way to initialise the C SFMT random number generator only once, then access the sfmt struct use for that initialisation in all my subsequent calls to c_random_sample from Python?

Many thanks in advance to anyone who can help with this.

Here's my C code in full. (To compile you'd need to have all the SMFT .c and .h files in the same folder, then compile using python setup.py build_ext --inplace)

#include "Python.h"
#include <stdio.h>
#include <time.h>
#include "SFMT.h"


static double*
get_double_array(PyObject* data)
{
    int i, size;
    double* out;
    PyObject* seq;

    seq = PySequence_Fast(data, "expected a sequence");
    if (!seq)
        return NULL;

    size = PySequence_Size(seq);
    if (size < 0)
        return NULL;

    out = (double*) PyMem_Malloc(size * sizeof(double));
    if (!out) {
        Py_DECREF(seq);
        PyErr_NoMemory();
        return NULL;
    }

    for (i=0; i<size; i++) {
        out[i] = PyFloat_AsDouble(PySequence_Fast_GET_ITEM(seq, i));
    }

    Py_DECREF(seq);

    if (PyErr_Occurred()) {
        PyMem_Free(out);
        out = NULL;
    }

    return out;
}


static PyObject*
c_random_sample(PyObject* self, PyObject* args)
{
    int i;
    double* pdf;

    PyObject* pdf_in;

    if (!PyArg_ParseTuple(args, "O:c_random_sample", &pdf_in))
        return NULL;

    pdf = get_double_array(pdf_in);
    if (!pdf)
        return NULL;

    // Create a struct which stores the Mersenne Twister's state
    sfmt_t sfmt;
    int seed = time(NULL);

    // Initialise the MT with a random seed from the system time
    sfmt_init_gen_rand(&sfmt, seed);
    // NOTE: This simply re-initialises the random number generator
    // on every call. We need to only initialise it once...

    double r = sfmt_genrand_real1(&sfmt);
    for (i=0; i<4; i++) {
        r -= pdf[i];
        if (r < 0) {
            break;
        }
     }
     PyMem_Free(pdf);
     return PyInt_FromLong(i);
}


static PyMethodDef functions[] = {
    {"c_random_sample", c_random_sample, METH_VARARGS},
    {NULL, NULL}
};


PyMODINIT_FUNC initc_random_sample(void)
{
    Py_InitModule4(
        "c_random_sample", functions, "Trying to implement random number sampling in C", NULL, PYTHON_API_VERSION
    );
}
Gareth Williams
  • 242
  • 4
  • 10
  • That kind of PRNG is probably still much slower than newer ones (PCG, xoroshiro and co.). Those are even available in python (through external libs). But much more important: maybe you should have shown your code first to reason about this performance. I can imagine some reasons apart from the PRNG on the source of that slowness. – sascha Feb 25 '17 at 17:38
  • Thanks for your reply sascha. I'm writing a simulator which needs to sample random numbers from distributions which are dependent on the current state of the system. So I can't easily cache samples in advance. Essentially Python is too slow for what I'm after, which is why I turned to C. Nevertheless I'll have a look at those PRNGs you mention, thank you. – Gareth Williams Feb 26 '17 at 18:51
  • I just did a quick test of SFMT vs PCG. SFMT seems to be a about twice as quick to create 1 billion uint32_t random numbers. – Gareth Williams Feb 26 '17 at 20:06
  • 1
    Although the speed tests at http://xoroshiro.di.unimi.it/ seem to suggest it can be quick. Maybe I'm doing something wrong. Thanks for pointing those out to me sascha. I have managed to get over my initialisation issue by defining a Python class in C, and initialising the random number generator in my class's constructor. – Gareth Williams Feb 26 '17 at 20:25
  • I see. I expected somewhat a wrong-usage of discrete-sampling (paying for the setup each time if for example O(1) sampling is used). But i see now, that your case is highly dynamic, which changes a lot. Keep in mind, that Python's PRNG-implementation within core + numpy is C-code too. But it's certainly not wrong to trust your benchmarks if done correctly. – sascha Feb 28 '17 at 10:11

2 Answers2

0

sfmt goes at the global scope in C:

static sfmt_t sfmt; /* outside any functions */

The initialization goes in your module init function - this is called once, when the module is first imported into Python:

PyMODINIT_FUNC initc_random_sample(void)
{
    /* Py_InitModule4 to initialize the module as before */ 

    int seed = time(NULL);
    sfmt_init_gen_rand(&sfmt, seed);
}
DavidW
  • 29,336
  • 6
  • 55
  • 86
0

This is not exactly what you are looking for, but if you don't mind the dependencies (mainly Numpy and Cython), you may find useful the ng-numpy-randomstate library which contains many fast random number generators (in particular dSFMT).

If you can use it, it will save you the work of writing the C wrappers.

Also, this module (or some features at least?) might get merged in Numpy in the future (cf. Numpy open issue #6967).

Pierre H.
  • 388
  • 1
  • 11