0

I am developing a Python C extension for a computation intensive algorithm, and for optimization purposes, I am using pthreads, to execute C code in parallel. In each thread, I update a dictionary, after some processing. The dictionary is shared among threads.

The problem is when I run with 1 thread, it works okay, but when I use more than 1 thread, it gives me a weird output, like this:

{0: 5, 1: 3, 249: 1, 253: 4, 254: 3, 255: 4, 256: 4, <refcnt -7 at 0x104036e10>: 1, <refcnt -7 at 0x104036e10>: 5, <refcnt -8 at 0x104036e10>: 4, <refcnt -7 at 0x104036e10>: 8, <refcnt -7 at 0x104036e10>: 5, <refcnt -7 at 0x104036e10>: 4, <refcnt -7 at 0x104036e10>: 6, <refcnt -7 at 0x104036e10>: 3, <refcnt -7 at 0x104036e10>: 2, <refcnt -7 at 0x104036e10>: 2, <refcnt -7 at 0x104036e10>: 3, <refcnt -7 at 0x104036e10>: 3, <refcnt -8 at 0x104036e10>: 1, <refcnt -7 at 0x104036e10>: 3, <refcnt -7 at 0x104036e10>: 5, <refcnt -7 at 0x104036e10>: 5, <refcnt 0 at 0x104036e10>: 7, <refcnt 0 at 0x104036e10>: 3, <refcnt 0 at 0x104036e10>: 5, <refcnt 0 at 0x104036e10>: 7, <refcnt 0 at 0x104036e10>: 2, <refcnt 0 at 0x104036e10>: 2, <refcnt 0 at 0x104036e10>: 2, <refcnt 0 at 0x104036e10>: 5, <refcnt 0 at 0x104036e10>: 2, <refcnt 0 at 0x104036e10>: 1, <refcnt 0 at 0x104036e10>: 5, <refcnt 0 at 0x104036e10>: 1, <refcnt 0 at 0x104036e10>: 4, <refcnt 0 at 0x104036e10>: 5, <refcnt 0 at 0x104036e10>: 2, <refcnt 0 at 0x104036e10>: 1, <refcnt 0 at 0x104036e10>: 2, <refcnt 0 at 0x104036e10>: 5, <refcnt 0 at 0x104036e10>: 6, -5: 1, -4: 1, -3: 5, -1: 3} 

Does anyone have any idea, why its happening with more than 1 threads being used, and run okay when only 1 thread is used?

Note: I have used mutex, and tested, so race condition is very unlikely.

This is the function that is run by each thread.

 static PyObject*
threaded_delt_freq(void *args)
{
PyDictObject * n_dict = NULL;

struct ThreadData * recvd_data = (struct ThreadData *)args;
if(recvd_data != NULL)
{


n_dict = glob_large_dict;

PyObject *matched_dic = PyDict_GetItemString(n_dict, "Matched Dict");

PyObject *strky_time_dict = PyDict_GetItemString(n_dict, "strky TIme Dict");

PyListObject *py_big_list = PyDict_GetItemString(n_dict, "big_lst");


int st = recvd_data->start;
  int stp = recvd_data->stop;

if(py_big_list != NULL)
{

     PyListObject * matched_dic_keys = PyDict_Keys(matched_dic);
    int client_num_strky = PyList_GET_SIZE(matched_dic_keys);
    int client_strky_iter = 0;
     for(client_strky_iter =st ; client_strky_iter < stp; client_strky_iter+=1)
    {
     if(client_strky_iter <  client_num_strky)
     {


        PyObject *curr_client_strky = PyList_GetItem(matched_dic_keys, client_strky_iter);

        PyListObject *curr_strky_t_list = PyDict_GetItem(strky_time_dict, curr_client_strky);

        int t_list_size = PyList_GET_SIZE(curr_strky_t_list);
        int curr_t_list_iter = 0;

        PyListObject *zipped_list = PyDict_GetItem(matched_dic, curr_client_strky);

        int zipped_list_iter = 0;
        int zipped_list_size = PyList_GET_SIZE(zipped_list);
        for(curr_t_list_iter = 0; curr_t_list_iter < t_list_size; curr_t_list_iter+=1)
        {

        PyIntObject * curr_t_obj = PyList_GetItem(curr_strky_t_list, curr_t_list_iter);

        long curr_t_val = PyInt_AsLong(curr_t_obj);
        for(zipped_list_iter=0; zipped_list_iter < zipped_list_size; zipped_list_iter+=1)
        {

            PyTupleObject *loc_obj_tuple = PyList_GetItem(zipped_list, zipped_list_iter);

            PyObject * t_db_loc_obj = PyTuple_GetItem(loc_obj_tuple, 0);

            char *t_str = PyString_AsString(t_db_loc_obj);
            char *s_str = PyString_AsString(obj_id_obj);
            long t_db_loc = (long)atoi(t_str);
            long obj_id = (long)atoi(s_str);
            if((obj_id-1) < PyList_GET_SIZE(py_big_list))
            {

                pthread_mutex_lock(&mutexes[obj_id-1]);
              PyIntObject * diff_val_check =  PyInt_FromLong((long)(t_db_loc - curr_t_val));

            PyIntObject * delt_val = PyDict_GetItem(obj_time_dict, diff_val_check);
            Py_DECREF(diff_val_check);





            if(delt_val != NULL)
            {

                PyIntObject * delt_new_val = PyInt_FromLong(delt_long_val);


                int rslt = PyDict_SetItem(obj_time_dict, delt_tmp_key , delt_new_val );
                Py_DECREF(delt_tmp_key);





            }
            else{

                PyIntObject * delt_tmp_key = PyInt_FromLong((long)(t_db_loc - curr_t_val));

                int rslt  = PyDict_SetItem(obj_time_dict, delt_tmp_key ,  PyInt_FromLong((long)1));
                Py_DECREF(delt_tmp_key);



            }

            pthread_mutex_unlock(&mutexes[obj_id-1]);



            }

        }

        }


         }
    }





}





}


   return n_dict;
}

Where:

strky_time_dict: is a dictionary, with a string key and with value list of ints

matched_dic: is a dictionary, with a string key and with zipped list as value

py_big_list: is list of dictionaries, (These dictionaries are being updated)

mutexes: is array of pthread_mutex_t, that is initialized, at the start of the program, using pthread_mutex_init() call.

Muhammad Sadiq Alvi
  • 1,099
  • 1
  • 13
  • 23
  • Are you holding the GIL? – thebjorn Apr 24 '16 at 10:04
  • Is GIL not automatically released, when code runs in C extension. if not then how we release GIL, in C Extension ? – Muhammad Sadiq Alvi Apr 24 '16 at 10:09
  • PLS show your code – Ohad the Lad Apr 24 '16 at 10:28
  • Python is not thread safe. You cannot run multiple threads in your C-Code and alter Python structures at the same time. – Daniel Apr 24 '16 at 10:55
  • So what is the solution you suggest, i mean, i want to update the dictionary, in parallel, thats needed to run code faster, and it is very much needed. Also i have update the post with some piece of code, please have a look. – Muhammad Sadiq Alvi Apr 24 '16 at 11:00
  • A Python dictionary is probably not the way to go for a communication store in any multi threaded program, since the a resizing of the hash would stop all processing. You should use a thread-safe data type instead, or perhaps a Queue can work for you..? – thebjorn Apr 24 '16 at 11:04
  • Its a dictionary with PyIntObject as key, and PyIntObject as value, so how can I restructure it into a queue ? – Muhammad Sadiq Alvi Apr 24 '16 at 11:06
  • 1
    You would have to rewrite your program so the worker threads pushed their values on a Queue (https://docs.python.org/2/library/queue.html) and the main program picked values off the queue (and perhaps stored them in a dict). The problem is that your mutexes don't matter, you need to hold the Python mutex (aka. the GIL) around every operation that involves the Python api (e.g. https://www.safaribooksonline.com/library/view/python-cookbook-3rd/9781449357337/ch15s07.html). – thebjorn Apr 24 '16 at 11:13
  • Thanks alot for your help :) – Muhammad Sadiq Alvi Apr 24 '16 at 17:48

0 Answers0