1

I'm trying to make a multiprocess program with Python. I have Import the multiprocess module and I try to start to process like so:

    p = Process(target=self.Parse)
    p.start()
    p.join()

In the class I have an internal thread counter and I increment the the counter every time is a process is spawned. But when I print the the thread count, the count doesn't get incremented. So then I call multiprocessing.active_children() but this returns an empty list. Does the program really not spawn the threads or processes or does it just report it? the code is as follows:

def run(self):
    if self.cont:
    while self.nxtLink or (self.thread>1):
        print(active_children())
        if self.thread<=self.count:
            p = Process(target=self.Parse)
            p.start()
            p.join()
        else:
            self.crawl(nxtLink.popleft())

The Parse function:

def Parse(self):
    self.thread+=1
    self.lock.acquire()
    next = self.nxtLink.popleft()
    self.lock.release()
    results = parser(next[0],next[1])
    #print("In Parse")
    self.broken[next[0]] = results.broken
    for i in results.foundLinks:
        if(self.thread<=self.count+5):
            p = Process(target = self.request, args = (i,next[0]))
            p.start()
            p.join()
        else:
            while (self.thread>self.count+5):
               pass   #Waits for the thread count to drop before spawning a new thread. 
            p = Process(target = self.request, args = (i,next[0]))
            p.start()
            p.join()
    self.lock.acquire()
    self.thread-=1
    self.lock.release()

Finally the request function:

def request(self, requestURL, requestingPageURL):
    # print(requestURL)
    self.lock.acquire()
    self.thread+=1
    self.lock.release()
    try:
        before = list(self.prev)
        self.lock.acquire()
        self.prev.append(requestURL)
        self.lock.release()
        if(requestURL in before):
            #print(before)
            return
        nextRequest = req.urlopen(requestURL)
        self.lock.acquire()
        self.nxtLink.append((requestURL,nextRequest))
        self.lock.release()
    except err.URLError:
        self.lock.acquire()
        try:
            self.broken[requestingPageURL].append(requestURL)
        except KeyError:
            self.broken[requestingPageURL] = [requestURL]
        self.lock.release()
    finally:
        self.lock.acquire()
        self.thread-=1
        self.lock.release()

I am really stuck on why Its not spawning processes But the program as a whole works fine so I'm a little confused.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
rady
  • 428
  • 5
  • 12

1 Answers1

0

join() waits for the process to complete. When you have a sequence like:

p = Process(target=self.Parse)
p.start()
p.join()

The parent program waits for the child to complete so you don't have active children at the point you make the check. You'd be better off just calling the functions instead of spawning children because you just wait for them to complete anyway. Its common for code like this to put Process objects in a list, do other work, and come back and join them later when the work is done.

You can add some debug code that tracks what's been called to verify that your child code is running:

import time
with open('/tmp/trace.txt', 'a') as fp:
    fp.write(time.asctime() + '\n')

Its a good idea in general to add some logging to the processes you spawn so that you can track things like python exceptions in your code.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • hmmm I didn't realize that the join() did that. One more question, How would you put the Process in a list and then join them? – rady Apr 17 '14 at 16:48
  • @user2985233, there are many ways to do it depending on what your code does. Check out the docs for multiprocessing.Pool and also examples of using multiprocessing.Queue for hints. You could pass a queue to the Process and have it send a 'done' message when done. Then, reading the queue tells you which Process should be joined next. – tdelaney Apr 17 '14 at 17:01
  • So I should have all the threads in a list and continually poll the list to see what process is done? and then join it? – rady Apr 17 '14 at 18:41
  • Its hard to say because your example is complicated... and likely doesn't really work yet. For instance, you run Process on a class instance but then do locks, thread+1, nxtLink.popleft(), etc..., that don't make sense (e.g., its a different process, so parent self.thread isnt child self.thread). This looks like it could be handled with multiprocessing.Pool - and you could change your code til it fits the model. – tdelaney Apr 17 '14 at 22:10
  • The above code works fine. Just doesn't spawn extra threads besides this main one and the other thread. – rady Apr 17 '14 at 23:40