Profiling C extension which calls back into Python

Question

Suppose for the purpose of this discussion, I have a function like this:

PyObject* tuple_from_dict(PyObject* ftype, PyObject* factory, PyObject* values) {
    PyObject* ttype = PyTuple_GetItem(factory, 1);
    PyObject* fmapping = PyTuple_GetItem(factory, 2);
    PyObject* key;
    PyObject* value;
    Py_ssize_t pos = 0;
    Py_ssize_t arg_len = 0;
    Py_ssize_t field;
    PyObject* result;

    if (PyDict_Size(fmapping) == 0) {
        result = PyObject_Call(ttype, PyTuple_New(0), NULL);
        Py_INCREF(result);
        return result;
    }

    while (PyDict_Next(fmapping, &pos, &key, &value)) {
        field = PyLong_AsSsize_t(value);
        if (field > arg_len) {
            arg_len = field;
        }
    }
    PyObject* args = PyTuple_New(arg_len + 1);

    pos = 0;
    while (pos < arg_len + 1) {
        Py_INCREF(Py_None);
        PyTuple_SetItem(args, pos, Py_None);
        pos++;
    }

    pos = 0;
    while (PyDict_Next(values, &pos, &key, &value)) {
        field = PyLong_AsSsize_t(PyDict_GetItem(fmapping, key));
        PyTuple_SetItem(args, field, value);
    }
    result = PyObject_Call(ttype, args, NULL);
    if (result) {
        Py_INCREF(result);
    }
    return result;
}

It doesn't matter what exactly does it do, the important point is that it calls PyObject_Call(...), which I suspect to be slow. But, the slowness we are talking about would not be noticeable on per call basis (the code overall does couple thousands calls per 1/100 of second). So... I need an aggregate, or some way of measuring the time with very high precision (so, clock_t doesn't seem like it's a good level of precision).

It's OK if the solution will work only on Linux. It is also OK if I could somehow slow everything down, but get a more precise measurement of the timing in question.

score 2 · Answer 1 · answered May 28 '18 at 14:15

Is clock_gettime() useful? It is POSIX interface to high resolution timer. This post provides this example usage.

#include <iostream>
#include <time.h>
using namespace std;

timespec diff(timespec start, timespec end);

int main()
{
    timespec time1, time2;
    int temp;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time1);
    for (int i = 0; i< 242000000; i++)
        temp+=temp;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time2);
    cout<<diff(time1,time2).tv_sec<<":"<<diff(time1,time2).tv_nsec<<endl;
    return 0;
}

timespec diff(timespec start, timespec end)
{
    timespec temp;
    if ((end.tv_nsec-start.tv_nsec)<0) {
        temp.tv_sec = end.tv_sec-start.tv_sec-1;
        temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
    } else {
        temp.tv_sec = end.tv_sec-start.tv_sec;
        temp.tv_nsec = end.tv_nsec-start.tv_nsec;
    }
    return temp;
}

Seems like it might be. It will take some time to implement this, I'll see how it goes. Thank you! — wvxvw, May 29 '18 at 08:51

Profiling C extension which calls back into Python

1 Answers1