I am trying to set up a small IPython cluster (This all worked neatly once upon a time with IPython 0.10.0 [sic!]) over ssh in a private network (no security required): 4 nodes alice, bob, carol, dan, each having 4 CPU cores. The controller runs on carol, and all PCs have Ubuntu 14.10 with IPython 2.3.0 installed. ~/.ipython/profile_default is shared via NFS among all PCs. Due to some internal reasons I cannot use MPI.
Now, if the cluster starts up, I can only see 4 engines. I already increased SSHEngineSetLauncher.delay, but this did not help
I tried to hunt this down and ended with using only carol (host) and trying to start four engines locally via SSH, but only one is actually running.
My ipclusterconfig.py looks like
c = get_config()
c.IPClusterStart.engine_launcher_class = 'SSHEngineSetLauncher'
c.SSHEngineSetLauncher.delay = 10
c.SSHEngineSetLauncher.engines = { 'carol' : 4}#, 'dan' : 4, 'alice' : 4, 'bob' : 4 }
engine.json:
{
"next_id": 4,
"engines": {
"0": "80d135a7-b8f6-435c-930a-0cde15a6feb2",
"1": "b69916c3-87c2-4e09-9284-aefe665ba616",
"2": "f3df3951-5e0b-4694-aa67-7ae66a181551",
"3": "4311705d-03d4-4e48-a7a9-7be47467c439"}}
For reference I add the log files: => ipcontroller.log
2015-05-21 07:28:24.442 [IPControllerApp] Hub listening on tcp://127.0.0.1:57360 for registration.
2015-05-21 07:28:24.443 [IPControllerApp] Hub using DB backend: 'NoDB'
2015-05-21 07:28:24.695 [IPControllerApp] hub::created hub
2015-05-21 07:28:24.695 [IPControllerApp] writing connection info to /home/lst3si/.ipython/profile_default/security/ipcontroller-client.json
2015-05-21 07:28:24.695 [IPControllerApp] writing connection info to /home/lst3si/.ipython/profile_default/security/ipcontroller-engine.json
2015-05-21 07:28:24.696 [IPControllerApp] task::using Python leastload Task scheduler
2015-05-21 07:28:24.696 [IPControllerApp] Heartmonitor started
2015-05-21 07:28:24.700 [IPControllerApp] Creating pid file: /home/lst3si/.ipython/profile_default/pid/ipcontroller.pid
2015-05-21 07:28:24.707 [IPControllerApp] client::client '\x00\x91y`\x0c' requested u'connection_request'
2015-05-21 07:28:24.707 [IPControllerApp] client::client ['\x00\x91y`\x0c'] connected
2015-05-21 07:28:26.071 [IPControllerApp] client::client '80d135a7-b8f6-435c-930a-0cde15a6feb2' requested u'registration_request'
2015-05-21 07:28:26.103 [IPControllerApp] WARNING | iopub::IOPub message lacks parent: {'parent_header': {}, 'msg_type': u'status', 'msg_id': u'230d5aa1-c395-4b82-a964-a3062e5550a9', 'content': {u'execution_state': u'starting'}, 'header': {u'date': datetime.datetime(2015, 5, 21, 7, 28, 26, 102954), u'username': u'lst3si', u'session': u'80d135a7-b8f6-435c-930a-0cde15a6feb2', u'msg_id': u'230d5aa1-c395-4b82-a964-a3062e5550a9', u'msg_type': u'status'}, 'buffers': [], 'metadata': {}}
2015-05-21 07:28:30.699 [IPControllerApp] registration::finished registering engine 0:80d135a7-b8f6-435c-930a-0cde15a6feb2
2015-05-21 07:28:30.699 [IPControllerApp] engine::Engine Connected: 0
2015-05-21 07:28:36.071 [IPControllerApp] client::client 'b69916c3-87c2-4e09-9284-aefe665ba616' requested u'registration_request'
2015-05-21 07:28:36.102 [IPControllerApp] WARNING | iopub::IOPub message lacks parent: {'parent_header': {}, 'msg_type': u'status', 'msg_id': u'f74a1f38-f3fb-422f-b4ad-0d1724745c64', 'content': {u'execution_state': u'starting'}, 'header': {u'date': datetime.datetime(2015, 5, 21, 7, 28, 36, 102052), u'username': u'lst3si', u'session': u'b69916c3-87c2-4e09-9284-aefe665ba616', u'msg_id': u'f74a1f38-f3fb-422f-b4ad-0d1724745c64', u'msg_type': u'status'}, 'buffers': [], 'metadata': {}}
2015-05-21 07:28:36.285 [IPControllerApp] client::client '\x00\x91y`\r' requested u'connection_request'
2015-05-21 07:28:36.285 [IPControllerApp] client::client ['\x00\x91y`\r'] connected
2015-05-21 07:28:39.699 [IPControllerApp] registration::finished registering engine 1:b69916c3-87c2-4e09-9284-aefe665ba616
2015-05-21 07:28:39.699 [IPControllerApp] engine::Engine Connected: 1
2015-05-21 07:28:46.143 [IPControllerApp] client::client 'f3df3951-5e0b-4694-aa67-7ae66a181551' requested u'registration_request'
2015-05-21 07:28:46.175 [IPControllerApp] WARNING | iopub::IOPub message lacks parent: {'parent_header': {}, 'msg_type': u'status', 'msg_id': u'a3aa09af-6958-4362-a1f4-5df01da8941b', 'content': {u'execution_state': u'starting'}, 'header': {u'date': datetime.datetime(2015, 5, 21, 7, 28, 46, 174675), u'username': u'lst3si', u'session': u'f3df3951-5e0b-4694-aa67-7ae66a181551', u'msg_id': u'a3aa09af-6958-4362-a1f4-5df01da8941b', u'msg_type': u'status'}, 'buffers': [], 'metadata': {}}
2015-05-21 07:28:51.699 [IPControllerApp] registration::finished registering engine 2:f3df3951-5e0b-4694-aa67-7ae66a181551
2015-05-21 07:28:51.700 [IPControllerApp] engine::Engine Connected: 2
2015-05-21 07:28:56.113 [IPControllerApp] client::client '4311705d-03d4-4e48-a7a9-7be47467c439' requested u'registration_request'
2015-05-21 07:28:56.145 [IPControllerApp] WARNING | iopub::IOPub message lacks parent: {'parent_header': {}, 'msg_type': u'status', 'msg_id': u'671288cf-32ea-4a41-8e17-9be4ba1216dd', 'content': {u'execution_state': u'starting'}, 'header': {u'date': datetime.datetime(2015, 5, 21, 7, 28, 56, 144586), u'username': u'lst3si', u'session': u'4311705d-03d4-4e48-a7a9-7be47467c439', u'msg_id': u'671288cf-32ea-4a41-8e17-9be4ba1216dd', u'msg_type': u'status'}, 'buffers': [], 'metadata': {}}
2015-05-21 07:29:00.698 [IPControllerApp] registration::finished registering engine 3:4311705d-03d4-4e48-a7a9-7be47467c439
2015-05-21 07:29:00.700 [IPControllerApp] engine::Engine Connected: 3
=> ipengine.log (all look the same, only "Completed registration with id x", where x increases from 0 to 3 for the engines):
2015-05-21 07:28:26.065 [IPEngineApp] Loading url_file u'.ipython/profile_default/security/ipcontroller-engine.json'
2015-05-21 07:28:26.070 [IPEngineApp] Registering with controller at tcp://127.0.0.1:57360
2015-05-21 07:28:26.101 [IPEngineApp] Starting to monitor the heartbeat signal from the hub every 3010 ms.
2015-05-21 07:28:26.102 [IPEngineApp] Using existing profile dir: u'.ipython/profile_default'
2015-05-21 07:28:26.103 [IPEngineApp] Completed registration with id 0