0

I wrote some program in python, which runs steps similar to, but more complicate than following steps:

STEP 1: Given a BATCH of lists with same length, each element of one list represents the number of states it may have, I need to DFS all the possible states(represented by 0,1,2...) of one list and get them in one list. e.g. input [[1,2,1], [2,2,2]], the output of this step should be [[0,0,0],[0,1,0],[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]]

STEP 2: Calculate some value related to the output of STEP 1, and return a dict with the form: {"0,0,0": 0.1, "0,1,0":0.2, "0,0,1":0.56, "0,1,0":0.68, "0,1,1":0.3242, "1,0,0":0.8987, "1,0,1":0.214, "1,1,0":0.2, "1,1,1":0.9}

STEP 3: In this step I need to process a BATCH of lists. While processing, I need to lookup(just read operation) the dict returned by STEP 2 very frequently, and generate one tuple for each list in the batch.

I found my program really slow in STEP 1 and STEP 3. STEP 2 can only be down in python by some reason. What's more, lists in batch are independent from each other, they only share the same dict in STEP 3. So I want to use multi threading to process these lists in parallel.

Since python has GIL, threading module doesn't work. Then I tried multiprocessing, even slower(I guess it is because of context switching and data transferring). Then I used c++11 to write a .so module containing functions receiving PyObject and returning PyObject. I used POXIS threads, but it always raised SegmentFault error as long as I use more than one thread. I carefully read the document of python-C-API, and found GIL is still needed, refer here. So this doesn't help at all. Then I used Cython by declaring types of all variables by cdef, it did accelerate but not that much.

I'm losing myself in this problem. Can anyone help me? I'll be really really grateful.

killa
  • 123
  • 6
  • reads like a classic map reduce problem and you're hurting to reinvent the map reduce pipeline rather than writing your mapper and reducer in an existing framework. – bauman.space Apr 19 '17 at 11:47
  • Step 1: I don't really see how your output comes from your input, but `itertools` defines good implementations for this kind of thing. Step 3: The GIL is needed where you're using Python threads or C++ threads - you won't win here. – DavidW Apr 20 '17 at 07:58
  • One further thought - I think a dict is the wrong structure and you should store your step 2 data in a 3D numpy array. The lookup in step 3 will probably be much quicker. – DavidW Apr 20 '17 at 08:02

0 Answers0