0

I am working on a traveling salesman problem. Given that all agents traverse the same graph to find their own path separately, i am trying to parallelize the path-finding action of agents. the task is for each iteration, all agents start from a start node to find their paths and collect all the paths to find the best path in the current iteration.

I am using pathos.multiprocessing.

the agent class has a traverse method as,

class Agent:
   def find_a_path(self, graph):
     # here is the logic to find a path by traversing the graph
     return found_path

I create a helper function to wrap up the method

def do_agent_find_a_path(agent, graph):
   return agent.find_a_path(graph)

then create a pool and employ amap by passing the helper function, a list of agent instance and the same graph,

pool = ProcessPool(nodes = 10)
res = pool.amap(do_agent_find_a_path, agents, [graph] * len(agents))

but, the processes are created in sequence and it runs very slow. I'd like to have some instructions on a correct/decent way to leverage pathos in this situation.

thank you!

UPDATE:

I am using pathos 0.2.3 on ubuntu,

Name: pathos
Version: 0.2.3
Summary: parallel graph management and execution in heterogeneous computing
Home-page: https://pypi.org/project/pathos
Author: Mike McKerns

i get the following error with the TreadPool sample code:

>import pathos
>pathos.pools.ThreadPool().iumap(lambda x:x*x, [1,2,3,4])
Traceback (most recent call last):
  File "/opt/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-f8f5e7774646>", line 1, in <module>
    pathos.pools.ThreadPool().iumap(lambda x:x*x, [1,2,3,4])
AttributeError: 'ThreadPool' object has no attribute 'iumap'```

liang li
  • 253
  • 1
  • 2
  • 9
  • If you are getting the above AttributeError, it makes me think you have an installation issue. Essentially, if `pathos` cannot find `multiprocess`, it will fall back to `multiprocessing`. You can check by (1) trying to `import _multiprocess`, and (2) by looking at the `__module__` attribute of a ThreadPool object. – Mike McKerns May 30 '19 at 00:11

1 Answers1

1

I'm the pathos author. I'm not sure how long your method takes to run, but from your comments, I'm going to assume not very long. I'd suggest that, if the method is "fast", that you use a ThreadPool instead. Also, if you don't need to preserve the order of the results, the fastest map is typically uimap (unordered, iterative map).

>>> class Agent:
...   def basepath(self, dirname):
...     import os
...     return os.path.basename(dirname)
...   def slowpath(self, dirname):
...     import time
...     time.sleep(.2)
...     return self.basepath(dirname)
... 
>>> a = Agent()
>>> import pathos.pools as pp
>>> dirs = ['/tmp/foo', '/var/path/bar', '/root/bin/bash', '/tmp/foo/bar']
>>> import time
>>> p = pp.ProcessPool()
>>> go = time.time(); tuple(p.uimap(a.basepath, dirs)); print(time.time()-go)
('foo', 'bar', 'bash', 'bar')
0.006751060485839844
>>> p.close(); p.join(); p.clear()
>>> t = pp.ThreadPool(4)
>>> go = time.time(); tuple(t.uimap(a.basepath, dirs)); print(time.time()-go)
('foo', 'bar', 'bash', 'bar')
0.0005156993865966797
>>> t.close(); t.join(); t.clear()

and, just to compare against something that takes a bit longer...

>>> t = pp.ThreadPool(4)
>>> go = time.time(); tuple(t.uimap(a.slowpath, dirs)); print(time.time()-go)
('bar', 'bash', 'bar', 'foo')
0.2055649757385254
>>> t.close(); t.join(); t.clear()
>>> p = pp.ProcessPool()
>>> go = time.time(); tuple(p.uimap(a.slowpath, dirs)); print(time.time()-go)
('foo', 'bar', 'bash', 'bar')
0.2084510326385498
>>> p.close(); p.join(); p.clear()
>>> 
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • Hi Mike, thanks for your generous answer! you are right, the method runs very quick. cause the order of results matters, i employ imap. but with some experiments, i found that it is the 1000-node NetworkX graph object that delays the creation of processes - it blocks creating next process around 30 seconds. if i reduce the graph size, the delay decreases clearly. i dump it via dill to see how big it is, the size of dill dump file is 31.5MB. So, the problem now is how to remove this fat-argument-copying bottleneck. your suggestion is highly appreciated. Thank you again for active support! – liang li May 26 '19 at 14:48
  • Hi Mike, as i use ThreadPool, i get the error ```AttributeError: 'ThreadPool' object has no attribute 'uimap'```, is it due to the pathos version? mine is 0.2.3. As I change 'uimap' to 'imap' for ThreadPool, I get the error ``TypeError: '>' not supported between instances of 'list' and 'int'`` – liang li May 27 '19 at 15:09
  • Maybe you are importing it incorrectly? I'm using `0.2.4dev0`, but `0.2.3` should be fine. This should work: `pathos.pools.ThreadPool().iumap(lambda x:x*x, [1,2,3,4])`. If you are still seeing errors, then are you on windows? If you are on Windows, you might need a C/C++ compiler to build `multiprocess` correctly. – Mike McKerns May 27 '19 at 21:55
  • If you are dumping with `dill` directly, then there are several variants that might give you a smaller pickle. See `dill.settings`. Unfortunately the settings do not have an effect (yet) on `pathos`'s use of `dill`. – Mike McKerns May 28 '19 at 00:17
  • Hi Mike, thanks for the generous reply! I am using 0.2.3 on ubuntu, i still have the error with your ThreadPool sample code. I've update the info and error in the question post. – liang li May 29 '19 at 19:43
  • Hi Mike, thanks for the generous reply! i did some more experiments, still show that it is fat NetworkX object (passed as an argument) blocks the process creating. is there shared memory in pathos or other technique that could overcome this? Thanks! – liang li May 29 '19 at 22:09