2

I have written a python script which will read from an Amazon SQS and create as many parallel processes as user wanted. It inherits Django BaseCommand, and this is the code.

def handle(self, *args, **kwargs):
    self.set_up(*args, **kwargs)
    process_queue = JoinableQueue(self.threads)
    process_pool = Pool(
        self.threads,
        self.worker_process,
        (process_queue,)
    )

    is_queue_empty = False
    while not is_queue_empty:
        message = self.get_next_message()
        if len(message) == 0:
            is_queue_empty = True
        else:
            process_queue.put(message[0])
    process_queue.join()
    raise CommandError('Number retries exceeded retry limit')

def worker_process(self, process_queue):
    while True:
        message = process_queue.get(True)
        message_tuple = (message)
        self.process_message(message_tuple)
        process_queue.task_done()

This is working fine and all the processes are getting killed once the tasks are done. But not for one particular activity, where I use boilerpipe to extract some data.

from boilerpipe.extract import Extractor
extractor = Extractor(extractor='DefaultExtractor', html=soup_html)
extractor.getText()

When I looked into the boilepipe code I could see that, in the constructor of Extractor there is this code,

lock = threading.Lock()
class Extractor():
    def __init__():
        # code
        try:
            # code
            lock.acquire()
            # code
        finally:
            lock.release()

full code is this

  1. Why the processes are not getting killed,Is there something wrong with my way of doing multi processing.
  2. Or is this thread locking is creating the issue (I am not all sure, just thinking about all possible what went wrongs).

Please advice, thanks in advance.

najeeb
  • 813
  • 12
  • 25
  • Thread ? Need a carefully `break`, otherwise got more bug if a thread contained more than one thread ! Lock, etc not good idea if waiting something. – dsgdfg Sep 27 '16 at 12:47
  • It's how they have done it, not sure what they were trying to achieve. Basically python boilerpipe is just a wrapper to the actual lib written in Java. – najeeb Sep 27 '16 at 13:24

0 Answers0