Multiprocessing pool.imap with large chunksize is skipping processing some records in iterable

Question

I am processing a list of string using multiprocessing pool.imap() with passing chunksize. My list length is 1821 and process in imap is 4. I was trying to give almost equal number of chunk size to each process so set chunk size as 455. Also tried with 500. But this makes my imap to skip some of the records. Skipping was not so random too as it is ordered list. Once I changed the chunk size to 200, imap started sending all the records to my target function. Can some one explain why the chunksize > 450 is causing issue here, while as per documentation it should be divided 1821/4 = 455 or 456 rec in each process ideally. Also note , in my function I am using that string and running some steps which takes few seconds for each. While for testing I tried to writing the string only in file inside target function and even then it was skipping some records.

  def process_init(self,l):
        global process_lock
        process_lock = l

  def _run_multiprocess(self,num_process,input_list,target_func,chunk):   
        l = mp.Lock()
        with mp.Pool(processes=(num_process),initializer=self.process_init, initargs=(l,)) as p:
            start = time.time()
            async_result  = p.imap(target_func, input_list,chunksize =chunk)

            p.close()
            p.join()
            print("Completed the multiprocess")
            end = time.time()
            print('total time (s)= ' + str(end-start))

chunksize = 500
self._run_multiprocess(4,iterator_source,self._process_data,chunksize)

def _process_data(self,well_name):       

        with open("MultiProcessMethod_RecordReceived.csv","a") as openfile:
            process_lock.acquire()
            openfile.write( "\t" +well_name.upper() + "\n")
            openfile.flush()
            process_lock.release()

I know the dataset I am sending and when looking for some specific string in output it is missing. Also I tried to put a conditional breakpoint in debug for the same and it never hit it. — Garima, Mar 29 '21 at 18:03

Multiprocessing pool.imap with large chunksize is skipping processing some records in iterable

0 Answers0