python multiprocessing (using pytable) misses some results from the queue in the final output

Question

Before I state my question, let me put my constraint - I can't post the code as it is related to my job and they don't allow it. So this is just a survey query to see if somebody has seen similar issues.

I have a python multiprocessing set up where the workers do the work and put the result in a queue. A special writer worker then accumulates the results from the queue. These results are simple pandas Series. The accumulator puts the results into a pandas dataframe and writes it to a pytable on the disk.

The issue is that I randomly see that sometimes a few results are missing in the dataframe, e.g. out of 268 expected columns I will get 267. This has happened around 10 out of 80 times in the last three months. The cure is - simply rerun the code (which means recalculate everything) and it works 100% the second time. I have ensured that there is no error in the calculations, so my guess is that it is related to multiprocessing or pytable data writing.

Any hints are appreciated. Sorry for not being able to put the code.

score -1 · Answer 1 · answered Jul 03 '17 at 11:31

It's really hard to help you without code. But I just think if you want to find "thin" places in your code you have to write log of it.

As I understood one iteration of your worker has to create 268 Series that are made as columns in final dataframe. If these Series are the same shape, then it seems that the issue in queue-worker — and you have to log it in all steps that you can.

python multiprocessing (using pytable) misses some results from the queue in the final output

1 Answers1