controller on the same LAN as the engines
The simple case is when the controller is on the same network as the engines, e.g. a login node or other work node, and that engines can connect to it. In this case, you will want the following config:
in ipcontroller_config.py, tell the controller to listen on all IPs (see caveat for exceptions to this):
c.HubFactory.ip = '*' # see caveat for cases where '*' may not work
in ipcluster_config.py, tell ipcluster
to use SGE to launch engines:
c.IPClusterEngines.engine_launcher_class = 'SGE'
That's about all you should need. Then you can start up with:
ipcluster start
or run the controller manually with
ipcontroller
and bring up engines after the fact, with
ipcluster engines -n 32
controller outside the cluster, with ssh
More complicated is starting the controller outside the network (e.g. on your laptop), while also starting the engines on the cluster. One reason for this is that the SGELauncher needs qsub
to be a local command, which it probably isn't on your laptop. For this, you need to use two sets of config - one for telling ipcluster
to ssh to the cluster and start engines, and one on the cluster to tell it to use SGE.
For this bit, I'm going to assume that the controller machine is ssh-able from the engines.
controller
On the controller, you will want to set the engine SSH server in ipcontroller_config.py
:
c.IPControllerApp.engine_ssh_server = 'mylocalmachineserver'
And tell local calls to ipcluster
to actuall call ipcluster
on the cluster via ssh in ipcluster_config.py
:
c.IPClusterEngines.engine_launcher_class = 'SSHProxy'
c.SSHProxyEngineSetLauncher.hostname = 'cluster-login-host'
cluster
On the cluster, you will have to create a profile with ipcluster_config.py
:
c.IPClusterEngines.engine_launcher_class = 'SGE'
And that should be it.
Starting the cluster
Now, what happens when you start the cluster with ipcluster start
on mylocalmachineserver
:
- starts a local ipcontroller, listening on localhost, wroting the ssh host in the engine connection file
- sends connection files to
cluster-login-host
- ssh to
cluster-login-host
and run ipcluster engines
- on
cluster-login-host
it picks up local config, and spawns engines with SGE
- engines on the cluster see the engine ssh server, and tunnel localhost to localhost on
mylocalmachineserver
- hopefully everything works!
Caveats
On clusters, it's common to have loads of network interfaces, and sometimes only one of them will actually work for engines to connect. If this is the case, it's often easier to specify a specific IP, rather than '*'
, which forces IPython to do some guessing when it tries to make connections. For instance, if you know that eth1
is the network interfaces where your nodes can see each other, then using the IP for eth1
may be best. netifaces is a useful library for getting this sort of information:
import netifaces
eth0 = netifaces.ifaddresses('eth0')
c.HubFactory.ip = eth0[netifaces.AF_INET][0]['addr']
Answers to sub-questions below:
c.EngineFactory.ip = '*'
This config is rarely, if ever, necessary, and should never be *
. This is used to tell ipengine how to connect to the controller when the connection file doesn't provide the right information. Typically, the best solution is to get the connection file right in the first place (ipcontroller config), rather than set a value in engine config.
a new engine [started with ipengine] is created on the node where I am, not through the queue system.
IPClusterEngines
config only affects when you start engines with ipcluster
. If you want to launch one engine with SGE with this config, you would do:
ipcluster engines -n 1
I guess I will need to specify a keyfile with the password to connect to my localmachine as well.
If you need to specify ssh config, you can do it in your ~/.ssh/config
. IPython uses the command-line ssh to set up tunnel, so any ssh aliases, etc. will work.
If your controller machine is on the same network as the engines, you probably don't need to use SSH at all. Typically, one sets c.HubFactory.ip = '*'
or one uses an ssh tunnel. The only time to use both of these is when the Hub is not on the same network as the engines at all, and the engines have to ssh to a machine on the same network as the controller, and then the ssh server connects to the controller on a LAN IP.