I have written a python script which will read from an Amazon SQS and create as many parallel processes as user wanted. It inherits Django BaseCommand, and this is the code.
def handle(self, *args, **kwargs):
self.set_up(*args, **kwargs)
process_queue = JoinableQueue(self.threads)
process_pool = Pool(
self.threads,
self.worker_process,
(process_queue,)
)
is_queue_empty = False
while not is_queue_empty:
message = self.get_next_message()
if len(message) == 0:
is_queue_empty = True
else:
process_queue.put(message[0])
process_queue.join()
raise CommandError('Number retries exceeded retry limit')
def worker_process(self, process_queue):
while True:
message = process_queue.get(True)
message_tuple = (message)
self.process_message(message_tuple)
process_queue.task_done()
This is working fine and all the processes are getting killed once the tasks are done. But not for one particular activity, where I use boilerpipe to extract some data.
from boilerpipe.extract import Extractor
extractor = Extractor(extractor='DefaultExtractor', html=soup_html)
extractor.getText()
When I looked into the boilepipe code I could see that, in the constructor of Extractor there is this code,
lock = threading.Lock()
class Extractor():
def __init__():
# code
try:
# code
lock.acquire()
# code
finally:
lock.release()
full code is this
- Why the processes are not getting killed,Is there something wrong with my way of doing multi processing.
- Or is this thread locking is creating the issue (I am not all sure, just thinking about all possible what went wrongs).
Please advice, thanks in advance.