2

Using Ubuntu 12.04 I am trying to set up a LAN cluster. The details:

Controller Config

# Configuration file for ipcontroller.

c = get_config()
c.IPControllerApp.reuse_files = True
c.IPControllerApp.engine_ssh_server = u'bar@bar1'
c.HubFactory.ip = '*'
c.HubFactory.db_class = 'NoDB'

Cluster Config

# Configuration file for ipcluster.

c = get_config()
c.IPClusterEngines.engine_launcher_class = 'SSH'
c.SSHEngineSetLauncher.engine_args = ['--profile-dir=~/.config/ipython/profile_foo']
c.SSHEngineSetLauncher.engines = {'foo@foo1' : 1, 'foo@foo2' : 1, 'foo@foo3' : 1, 'foo@foo4' : 1}

Engine config

# Configuration file for ipengine.

c = get_config()
c.EngineFactory.timeout = 10

So, then running

ipcluster start --profile=foo --debug

yields the following:

2013-09-03 19:43:45.772 [IPClusterStart] Process 'ssh' started: 5198
2013-09-03 19:43:45.773 [IPClusterStart] Process 'engine set' started: [None, None, None, None]
2013-09-03 19:43:47.086 [IPClusterStart] 2013-09-03 19:44:02.726 [IPEngineApp] Completed registration with id 0
2013-09-03 19:43:47.795 [IPClusterStart] 2013-09-03 19:43:53.737 [IPEngineApp] Completed registration with id 1
2013-09-03 19:43:48.561 [IPClusterStart] 2013-09-03 19:43:59.793 [IPEngineApp] Completed registration with id 2
2013-09-03 19:43:49.667 [IPClusterStart] 2013-09-03 19:44:03.859 [IPEngineApp] Completed registration with id 3
2013-09-03 19:44:15.773 [IPClusterStart] Engines appear to have started successfully

Looks good to me. But when I try to connect with a Client, I get less than the anticipated number of engines. This occurs even for 1 or 2 engines running on a single remote machine

In [22]: rc=Client(profile='foo')

In [23]: rc.ids
Out[23]: [1, 2]

I set the timeout high in case that was the issue, but it persists.

If I run ipcontroller and ipengines separately, the process succeeds, but I would really prefer being able to start and stop a cluster with ipcluster.

phil0stine
  • 303
  • 1
  • 13
  • Since you have instructed it to build ssh tunnels for every connection, you may need to increase the delay between engine startup with `SSHEngineSetLauncher.delay=10`. – minrk Sep 04 '13 at 15:42
  • Ok thanks I will try that. Would you recommend another strategy (MPI?) to improve speeds? – phil0stine Sep 04 '13 at 22:58
  • Presumably you don't need SSH tunnels if all of your engines can be started in the same MPI universe. Is this true? – minrk Sep 05 '13 at 01:16
  • I am not well versed enough (yet) in MPI to know the answer, though I believe so. I am using a small cluster on a LAN, I chose SSH to quickly begin testing my algorithm, though as I read now I see SSH is relatively slow. As long as I can use the Direct Interface, I expect MPI will work. Suggestions welcome, thanks again. – phil0stine Sep 05 '13 at 01:54
  • HI, I've been having a (seemingly) similar problem. Did you manage to solve yours? Did switching to MPI do the trick? In my case the `ipcluster start --profile=my-ssh` returns successfully, saying all the engines apparently started, but I can only see the local engines from python. – Alex S Nov 05 '13 at 13:03
  • Never mind, apparently I had misconfigured the IP listener, got it to work now! Hope you did too. – Alex S Nov 05 '13 at 13:39

0 Answers0