0

I have a list operation where i need to loop through a list, identify an item, and build a dictionary, what's the best multiprocessing way to do this? the list is rather long with over 1M items.

result_dict = dict()    
for id in list_a:
     if id not in check_list:  #black_list for verfication
     result_dict[id]= func_buildup(id)  # function to build up value
Hai Vu
  • 37,849
  • 11
  • 66
  • 93
Howell Pan
  • 53
  • 6
  • Split the list into sections. Make one thread per section – OneCricketeer Jan 05 '18 at 01:18
  • Can you garuntee that each `id` is unique? Have you seen this answer to a similar question? https://stackoverflow.com/a/38560235/6779307 – Patrick Haugh Jan 05 '18 at 01:19
  • Can you first find all the ids and then update the dictionary? This would be easier because if you do multiprocessing and need each worker to update the same dictionary, this isn’t impossible but you have to create a shared data structure that each worker can access and make sure to lock then release when a worker is accessing the dictionary. If you can get away with not sharing the dictionary check out joblib to write an embarrassingly parallel function to find the ids. Otherwise use the multiprocessing module and create a shared dictionary for each worker to access. – rmilletich Jan 05 '18 at 01:31
  • As an aside, you should make sure `check_list` is a `set` object – juanpa.arrivillaga Jan 05 '18 at 05:48

0 Answers0