5

I am trying to implement multiprocessing with Python. It works when pooling very quick tasks, however, freezes when pooling longer tasks. See my example below:

from multiprocessing import Pool
import math
import time

def iter_count(addition):
    print "starting ", addition
    for i in range(1,99999999+addition):
        if i==99999999:  
            print "completed ", addition
            break

if __name__ == '__main__':
    print "starting pooling "
    pool = Pool(processes=2)
    time_start = time.time()
    possibleFactors = range(1,3)   

    try: 
        pool.map( iter_count, possibleFactors)
    except:
        print "exception"

    pool.close()
    pool.join()      

    #iter_count(1)
    #iter_count(2)
    time_end = time.time()
    print "total loading time is : ", round(time_end-time_start, 4)," seconds"

In this example, if I use smaller numbers in for loop (something like 9999999) it works. But when running for 99999999 it freezes. I tried running two processes (iter_count(1) and iter_count(2)) in sequence, and it takes about 28 seconds, so not really a big task. But when I pool them it freezes. I know that there are some known bugs in python around multiprocessing, however, in my case, same code works for smaller sub tasks, but freezes for bigger ones.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
hercules.cosmos
  • 265
  • 1
  • 3
  • 10
  • 1
    What version of Python are you using? Some of those known bugs in `multiprocessing` you referred to were fixed in 2.7, or in later 2.6.x or 2.7.x versions, but if you're using a version from before those fixes obviously you still have those bugs… And generally, multiprocessing/multithreading bugs are the kind of thing that only happen 1 time in a million or less, so it wouldn't be all that surprising if N usually works but 10N usually fails… – abarnert Jan 08 '14 at 22:23
  • I am using python version 2.7.5 – hercules.cosmos Jan 08 '14 at 22:43
  • 1
    I seem to remember having similar issues at some point in the past when my worker threads were doing lots of writing to stdout. Have you tried removing the print statement? – John Greenall Jan 09 '14 at 00:19

1 Answers1

6

You're using some version of Python 2 - we can tell because of how print is spelled.

So range(1,99999999+addition) is creating a gigantic list, with at least 100 million integers. And you're doing that in 2 worker processes simultaneously. I bet your disk is grinding itself to dust while the OS swaps out everything it can ;-)

Change range to xrange and see what happens. I bet it will work fine then.

Tim Peters
  • 67,464
  • 13
  • 126
  • 132
  • When i change range to xrange, yes it works. However, what i don't understand is: how it works when I run those tasks sequentially, but freezes when I run them in parallel. And overall, we are not talking about complicated calculation, both tasks takes about 30 seconds. – hercules.cosmos Jan 09 '14 at 15:07
  • 7
    It has nothing to do with the calculations: it has entirely to do with peak memory use. And your program wasn't freezing, it was just running **extremely** slowly because you were out of RAM. When you do them serially, it takes half the RAM. You were just lucky then. Those gigantic lists require gigabytes of RAM. `xrange` gives you an iterator instead of a giant list, and requires a tiny amount of RAM. That's all there is to it. – Tim Peters Jan 09 '14 at 16:25
  • 1
    I love it the way one problem can masquerade as another. – vy32 Apr 23 '16 at 14:02