0

I've created a cluster of EC2 instances using cfncluster and now I need to run the dispynode.py command on all the nodes.

I do that by first creating a list of private IP addresses called "workers.txt" then running the following bash command

for host in $(cat workers.txt); do 
    ssh $host "dispynode.py --ext_ip_addr $host &"; 
done

this appears to work since I get the expected dispynode output for each IP address. For example, for each IP address I'll get an output similar to this

  NOTE: Using dispy port 61591 (was 51348 in earlier versions)


2019-08-22 06:07:12 dispynode - dispynode version: 4.11.0, PID: 16074

2019-08-22 06:07:12 dispynode - Files will be saved under "/tmp/dispy/node"

2019-08-22 06:07:12 pycos - version 4.8.11 with epoll I/O notifier

2019-08-22 06:07:12 dispynode - "ip-172-31-8-242" serving 8 cpus

Enter "quit" or "exit" to terminate dispynode,
  "stop" to stop service, "start" to restart service,
  "release" to check and close computation,
  "cpus" to change CPUs used, anything else to get status: 
Enter "quit" or "exit" to terminate dispynode,
  "stop" to stop service, "start" to restart service,
  "release" to check and close computation,
  "cpus" to change CPUs used, anything else to get status: 
  NOTE: Using dispy port 61591 (was 51348 in earlier versions)

the problem is, when I SSH into the node and check if the process is running, it's not.

ssh 172.31.8.242
kill -0 16074
-bash: kill: (16074) - No such process

And the dispy client doesn't work and can't discover the nodes.

Question: Why isn't my parallel ssh command starting the program on the nodes and/or why doesn't the process remain running if it was started

1 Answers1

0

I haven't used dispy myself, but the "Enter 'quit' or 'exit' to terminate dispynode..." message suggests that dispynode is running interactively and reading from standard input. In that case, when you close the SSH session, dispynode will read an end-of-file condition on its standard input, and it might exit when that happens.

According to the dispy documentation, dispynode has a --daemon option which prevents it from running interactively:

--daemon option causes dispynode to not read from standard input, so dispynode can be run as background process, or started from (system startup) scripts. If this option is not given, dispynode prints menu of commands, and commands can be entered to get status and control dispynode.

So, try using the --daemon option:

for host in $(cat workers.txt); do 
    ssh $host "dispynode.py --ext_ip_addr $host --daemon &"; 
done

The "&" may be unnecessary here, because dispynode might put itself in the background.

Kenster
  • 23,465
  • 21
  • 80
  • 106
  • This was a great idea and I think it should work in principle, but after trying it and googling around a bit there's a bug with the daemon option and I haven't been able to get it to work. Others have suggested using something like nohup or tmux. I'm still in the process of figuring it out. Id give you +1 for the idea but I have less than 15 reputation points and stackoverflow won't let me upvote.. Nonetheless, thank you for your input – ApprenticeOfMathematics Aug 22 '19 at 20:49