3

I am a newbie to python,i am have function that calculate feature for my data and then return a list that should be processed and written in file.,..i am using Pool to do the calculation and then and use the callback function to write into file,however the callback function is not being call,i ve put some print statement in it but it is definetly not being called. my code looks like this:

def write_arrow_format(results):
print("writer called")
results[1].to_csv("../data/model_data/feature-"+results[2],sep='\t',encoding='utf-8')
with open('../data/model_data/arow-'+results[2],'w') as f:
     for dic in results[0]:
         feature_list=[]
         print(dic)
         beginLine=True
         for key,value in dic.items():
              if(beginLine):
                feature_list.append(str(value))
                beginLine=False
              else:
                feature_list.append(str(key)+":"+str(value))
         feature_line=" ".join(feature_list)
         f.write(feature_line+"\n")


def generate_features(users,impressions,interactions,items,filename):
    #some processing 
    return [result1,result2,filename]





if __name__=="__main__":
   pool=mp.Pool(mp.cpu_count()-1)

   for i in range(interval):
       if i==interval:
          pool.apply_async(generate_features,(users[begin:],impressions,interactions,items,str(i)),callback=write_arrow_format)
       else:
           pool.apply_async(generate_features,(users[begin:begin+interval],impressions,interactions,items,str(i)),callback=write_arrow_format)
           begin=begin+interval
   pool.close()
   pool.join()
Eliethesaiyan
  • 2,327
  • 1
  • 22
  • 35
  • Because the file is too long..i pasted the codes that are problematics.. interval variable is given – Eliethesaiyan Jun 19 '16 at 12:48
  • I don't see any error in your code which would prevent the callback function from getting called. A good debugging technique is to progressively pare down your code until you have a very simple example which demonstrates the problem. One of two very good things will happen: either you will have a *runnable* minimal example which you can post here (greatly increasing your chance of getting a good answer) or in the process of simplifying the code you will discover where the error lies. – unutbu Jun 19 '16 at 15:51
  • @unutbu i also dont know why call back is not being called...all the methods are running correctly but defiinetly not the callback..i tried to debug it but in vain..i commented all the codes except the print...,but still not calling it – Eliethesaiyan Jun 20 '16 at 00:17
  • Perhaps approach the problem from both ends: Find the simplest code you can which *successfully* uses a multiprocessing callback. Then incrementally build that code up to perform the calculation you want done in your actual script. Somewhere in the middle you will find what's wrong with your current code. – unutbu Jun 20 '16 at 13:23
  • @unutbu i ve found out that the pool functions(apply,apply async) only return the results if everything goes well,otherwise they stay silent without giving any traceback of what happened in spawned processes http://bugs.python.org/issue13831Ups – Eliethesaiyan Jun 25 '16 at 03:49

1 Answers1

6

It's not obvious from your post what is contained in the list returned by generate_features. However, if any of result1, result2, or filename are not serializable, then for some reason the multiprocessing lib will not call the callback function and will fail to do so silently. I think this is because the multiprocessing lib attempts to pickle objects before passing them back and forth between child processes and the parent process. If anything you're returning isn't "pickleable" (i.e not serializable) then the callback doesn't get called.

I've encountered this bug myself, and it turned out to be an instance of a logger object that was giving me troubles. Here is some sample code to reproduce my issue:

import multiprocessing as mp
import logging 

def bad_test_func(ii):
    print('Calling bad function with arg %i'%ii)
    name = "file_%i.log"%ii
    logging.basicConfig(filename=name,level=logging.DEBUG)
    if ii < 4:
        log = logging.getLogger()
    else:
        log = "Test log %i"%ii
    return log

def good_test_func(ii):
    print('Calling good function with arg %i'%ii)
    instance = ('hello', 'world', ii)
    return instance

def pool_test(func):
    def callback(item):
        print('This is the callback')
        print('I have been given the following item: ')
        print(item)
    num_processes = 3
    pool = mp.Pool(processes = num_processes)
    results = []
    for i in range(5):
        res = pool.apply_async(func, (i,), callback=callback)
        results.append(res)
    pool.close()
    pool.join()

def main():

    print('#'*30)
    print('Calling pool test with bad function')
    print('#'*30)

    pool_test(bad_test_func)

    print('#'*30)
    print('Calling pool test with good function')
    print('#'*30)
    pool_test(good_test_func)

if __name__ == '__main__':
    main()

Hopefully this helpful and points you in the right direction.

Mr. Frobenius
  • 324
  • 2
  • 8
  • 1
    OMG!!!! thank you so much =P Also it seems like anything that fails inside a call back function fails silently without propagating the exception back into the logs. – doubleOK Dec 27 '17 at 10:04