I've created a cluster of EC2 instances using cfncluster and now I need to run the dispynode.py command on all the nodes.
I do that by first creating a list of private IP addresses called "workers.txt" then running the following bash command
for host in $(cat workers.txt); do
ssh $host "dispynode.py --ext_ip_addr $host &";
done
this appears to work since I get the expected dispynode output for each IP address. For example, for each IP address I'll get an output similar to this
NOTE: Using dispy port 61591 (was 51348 in earlier versions)
2019-08-22 06:07:12 dispynode - dispynode version: 4.11.0, PID: 16074
2019-08-22 06:07:12 dispynode - Files will be saved under "/tmp/dispy/node"
2019-08-22 06:07:12 pycos - version 4.8.11 with epoll I/O notifier
2019-08-22 06:07:12 dispynode - "ip-172-31-8-242" serving 8 cpus
Enter "quit" or "exit" to terminate dispynode,
"stop" to stop service, "start" to restart service,
"release" to check and close computation,
"cpus" to change CPUs used, anything else to get status:
Enter "quit" or "exit" to terminate dispynode,
"stop" to stop service, "start" to restart service,
"release" to check and close computation,
"cpus" to change CPUs used, anything else to get status:
NOTE: Using dispy port 61591 (was 51348 in earlier versions)
the problem is, when I SSH into the node and check if the process is running, it's not.
ssh 172.31.8.242
kill -0 16074
-bash: kill: (16074) - No such process
And the dispy client doesn't work and can't discover the nodes.
Question: Why isn't my parallel ssh command starting the program on the nodes and/or why doesn't the process remain running if it was started