HPX minimal two node example set-up?

Question

The HPX getting started tutorial assumes you are using PBS or slurm. These may be quite common in the HPC community but as a developer I'm more used to the scenario of here are a couple of machines you can install stuff on.

It's not immediately obvious whether a scheduler like slurm is required to leverage multiple physical machines or just convenient for managing a cluster.

I know you can simulate multiple localities using the -l flag when you run an HPX application (see for example this question) what I want is to run the same application on 2 nodes and have them communicate with each other.

What is the minimum needed to tell HPX:
Here is one other machine with this IP address to which you can send tasks?

Alternatively what is the minimum slurm configuration to reach this stage?

Installing slurm was easy finding a simple 2 node example less so.Though this link to a podcast may help

I'm also assuming HPX's parcel port will just work over TCP without installing anything extra (e.g. MPI). Is this correct?

Update I think I'm getting closer but I'm still missing something. Firstly I'm using the hello_world example. Could it be that it is too simple for the 2 node test? I am hoping for similar output to running 2 localities on the same node:

APP=$HPX/bin/hello_world
$APP --hpx:node 0 --hpx:threads 4 -l2 &
$APP --hpx:node 1 --hpx:threads 4

sample output:

hello world from OS-thread 2 on locality 0
hello world from OS-thread 0 on locality 0
hello world from OS-thread 1 on locality 1
hello world from OS-thread 3 on locality 1
hello world from OS-thread 2 on locality 1
hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 1
hello world from OS-thread 3 on locality 0

but when I try to remote it both processes hang:

$APP --hpx:localities=2 --hpx:agas=$NODE0:7910 --hpx:hpx=$NODE0:7910 --hpx:threads 4 &
ssh $NODE1 $APP --hpx:localities=2 --hpx:agas=$NODE0:7910 --hpx.hpx=$NODE1:7910 --hpx:threads 4

I have opened port 7910 on both machines. The path to $APP is the same on both nodes. I'm not sure how to test whether the second process is talking to the agas server.

If I use "--hpx:debug-agas-log=agas.log" and "--hpx:debug-hpx-log=hpx.log" & I get:

>cat hpx.log 
(T00000000/----------------.----/----------------) P--------/----------------.---- 14:18.29.042 [0000000000000001]    [ERR] created exception: HPX(success)
(T00000000/----------------.----/----------------) P--------/----------------.---- 14:18.29.042 [0000000000000002]    [ERR] created exception: HPX(success)

on both machines. I'm not sure how to interpret this.

I've tried a few other options such as --hpx:run-agas-server (I think that is possibly implied by using --hpx:agas=)

I also tried

ssh $NODE1 $APP --hpx:nodes="$NODE0 $NODE1" &
$APP --hpx:nodes="$NODE0 $NODE1"

as suggested by the other (now deleted?) answer with no luck.

update 2

I thought it might be a firewall issue even with the firewall disabled nothing seems to happen. I've tried running a trace on the system calls but there is nothing obvious:

echo "start server on agas master: node0=$NODE0"
strace -o node0.strace $APP \
 --hpx:localities=2 --hpx:agas=$NODE0:7910 --hpx:hpx=$NODE0:7910 --hpx:threads 4 &
cat agas.log hpx.log
echo "start worker on slave: node1=$NODE1"
ssh $NODE1 \
strace -o node1.strace $APP \
--hpx:worker --hpx:agas=$NODE0:7910 --hpx.hpx=$NODE1:7910 
echo "done"
exit 0

tail of node0.strace:

15:13:31 bind(7, {sa_family=AF_INET, sin_port=htons(7910), sin_addr=inet_addr("172.29.0.160")}, 16) = 0 
15:13:31 listen(7, 128)                 = 0 
15:13:31 ioctl(7, FIONBIO, [1])         = 0 
15:13:31 accept(7, 0, NULL)             = -1 EAGAIN (Resource temporarily unavailable) 
...
15:13:32 mprotect(0x7f12b2bff000, 4096, PROT_NONE) = 0 
15:13:32 clone(child_stack=0x7f12b33feef0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f12b33ff9d0, tls=0x7f12b33ff700, child_tidptr=0x7f12b33ff9d0) = 22394 
15:13:32 futex(0x7ffe2c5df60c, FUTEX_WAIT_PRIVATE, 1, NULL) = 0 
15:13:32 futex(0x7ffe2c5df5e0, FUTEX_WAKE_PRIVATE, 1) = 0 
15:13:32 futex(0x7ffe2c5df4b4, FUTEX_WAIT_PRIVATE, 1, NULL

tail of node1.strace:

6829 15:13:32 bind(7, {sa_family=AF_INET, sin_port=htons(7910), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 
16829 15:13:32 listen(7, 128)           = 0 
16829 15:13:32 ioctl(7, FIONBIO, [1])   = 0 
16829 15:13:32 accept(7, 0, NULL)       = -1 EAGAIN (Resource temporarily unavailable) 
16829 15:13:32 uname({sys="Linux", node="kmlwg-tddamstest3.grpitsrv.com", ...}) = 0 
16829 15:13:32 eventfd2(0, O_NONBLOCK|O_CLOEXEC) = 8 
16829 15:13:32 epoll_create1(EPOLL_CLOEXEC) = 9 
16829 15:13:32 timerfd_create(CLOCK_MONOTONIC, 0x80000 /* TFD_??? */) = 10 
16829 15:13:32 epoll_ctl(9, EPOLL_CTL_ADD, 8, {EPOLLIN|EPOLLERR|EPOLLET, {u32=124005464, u64=140359655238744}}) = 0 
16829 15:13:32 write(8, "\1\0\0\0\0\0\0\0", 8) = 8 
16829 15:13:32 epoll_ctl(9, EPOLL_CTL_ADD, 10, {EPOLLIN|EPOLLERR, {u32=124005476, u64=140359655238756}}) = 0 
16829 15:13:32 futex(0x7fa8006f2d24, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7fa8006f2d20, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 
16830 15:13:32  )    = 0 
16829 15:13:32 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP 
16830 15:13:32 futex(0x7fa8076432f0, FUTEX_WAKE_PRIVATE, 1) = 0 
16829 15:13:32  )   = 11 
16830 15:13:32 epoll_wait(9,  
16829 15:13:32 epoll_ctl(9, EPOLL_CTL_ADD, 11, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=124362176, u64=140359655595456}} 
16830 15:13:32  {{EPOLLIN, {u32=124005464, u64=140359655238744}}}, 128, -1) = 1 
16829 15:13:32  ) = 0 
16830 15:13:32 epoll_wait(9,  
16829 15:13:32 connect(11, {sa_family=AF_INET, sin_port=htons(7910), sin_addr=inet_addr("172.29.0.160")}, 16 
16830 15:13:32  {{EPOLLHUP, {u32=124362176, u64=140359655595456}}}, 128, -1) = 1 
16830 15:13:32 epoll_wait(9,

If I do an strace -f on the master its child process loops doing something like this:

22050 15:12:46 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 12 
22050 15:12:46 epoll_ctl(5, EPOLL_CTL_ADD, 12, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=2395115776, u64=140516545171712}}) = 0 
22041 15:12:46  {{EPOLLHUP, {u32=2395115776, u64=140516545171712}}}, 128, -1) = 1 
22050 15:12:46 connect(12, {sa_family=AF_INET, sin_port=htons(7910), sin_addr=inet_addr("127.0.0.1")}, 16 
22041 15:12:46 epoll_wait(5,  
22050 15:12:46  )  = -1 ECONNREFUSED (Connection refused) 
22041 15:12:46  {{EPOLLHUP, {u32=2395115776, u64=140516545171712}}}, 128, -1) = 1 
22050 15:12:46 futex(0x7fcc9cc20504, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1703, {1455808366, 471644000}, ffffffff 
22041 15:12:46 epoll_wait(5,  
22050 15:12:46  )    = -1 ETIMEDOUT (Connection timed out) 
22050 15:12:46 futex(0x7fcc9cc204d8, FUTEX_WAKE_PRIVATE, 1) = 0 
22050 15:12:46 close(12)                = 0 
22050 15:12:46 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 12 
22050 15:12:46 epoll_ctl(5, EPOLL_CTL_ADD, 12, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, {u32=2395115776, u64=140516545171712}}) = 0 
22050 15:12:46 connect(12, {sa_family=AF_INET, sin_port=htons(7910), sin_addr=inet_addr("127.0.0.1")}, 16 
22041 15:12:46  {{EPOLLHUP, {u32=2395115776, u64=140516545171712}}}, 128, -1) = 1 
22050 15:12:46  )  = -1 ECONNREFUSED (Connection refused) 
22041 15:12:46 epoll_wait(5,  
22050 15:12:46 futex(0x7fcc9cc20504, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1705, {1455808366, 572608000}, ffffffff 
22041 15:12:46  {{EPOLLHUP, {u32=2395115776, u64=140516545171712}}}, 128, -1) = 1

Update 3

The astute of you may have noticed that in update 2 I accidentally wrote --hpx.hpx instead of --hpx:hpx. Guess what! Changing that fixed it. So technically the first answer was correct and I'm just dumb. I would have expected an error from the command line options parser but I guess when you're making a massively parallel runtime you can't have everything :).

Thanks for the help everyone.

Thanks for letting us know that `--hpx.hpx` is not producing an error message (or at least a warning). I'm not sure yet how to solve it (this was a deliberate design decision - for arcane reasons), but I have created a ticket as a reminder for us to look into this (https://github.com/STEllAR-GROUP/hpx/issues/1995). — hkaiser, Feb 20 '16 at 23:40
Good to see you're on the ball. I was going to add the ticket myself. — Bruce Adams, Feb 21 '16 at 21:19

hkaiser · Accepted Answer · 2016-02-17T20:01:52.183

Option 1: When using TCP/IP for the networking layer (usually the default):

In order for an HPX application to be able to find all connected nodes, the following information has to be provided outside of batch environments:

locality 0:
./yourapp --hpx:localities=2 --hpx:agas=node0:7910 --hpx:hpx=node0:7910 

locality 1:
./yourapp --hpx:agas=node0:7910 --hpx:hpx=node1:7910 --hpx:worker

Where node0 and node1 are the hostnames of those nodes and 7910 is an (arbitrary) TCP/IP port to use.

In other words,

on node0 you specify the port where HPX will listen for incoming messages on this node (--hpx:hpx=node0:7910) and the port where the main instance of the Active Global Address Space (AGAS) engine will listen (this will be used for other nodes to establish the initial connection (--hpx:agas=node0:7910). You also specify that overall 2 localities will connect (--hpx:localities=2).
on node1 (and all other nodes you want to connect) you specify the port where HPX will listen for incoming messages on this node (--hpx:hpx=node1:7910) and the port where the main AGAS engine can be reached on locality 0 (--hpx:agas=node0:7910). You also specify that this locality is a worker (not the 'console'), which is done by the --hpx:worker command line option.

Note that all of those options have one-letter shortcuts (--hpx:localities == -l, --hpx:hpx == -x, --hpx:agas == -a, and --hpx:worker == -w)

You can also run more than one locality on the same physical compute node (or your laptop). In this case it's a bit less tedious to specify things, for instance:

./yourapp -l2 -0 &
./yourapp -1

If you want to use the extended command line options in this case, make sure the ports used for -x are unique across all localities which run on the same node.

Option 2: When using MPI (requires special build time configuration):

Just use mpirun to execute your application. It will pick up the settings either from your batch environment or it will use the commandline options to run things. For instance:

mpirun -N1 -np2 ./yourapp

this will run two instances of your application on the current compute node.

Thanks. I think that gets me a lot closer but I'm still stuck. I've updated the question to reflect where I am now as the comment space is too small. — Bruce Adams, Feb 15 '16 at 14:47
The only thing I forgot to mention in my anser was that all localities which are not locality 0 have to be told that they are worker localities. You do that by adding the `-w` flag on the command line. I updated my answer accordingly. — hkaiser, Feb 17 '16 at 20:01
That doesn't seem to have made any difference to me. I guess worker must be implied if the agas server is on another machine? I've added some system traces in case that is illuminating. Is there a good way of pinging the agas server to check it is running? — Bruce Adams, Feb 18 '16 at 15:26
The problem here seems to be the port specifications. If you completely omit those (get rid of the :7910 for the node specification), it works for me. It looks like there is a bug in the initialization sequence. If no ports are given, we listen on the port 7910 + locality_id for incoming connections. For some reason, this gets not overriden when explicitly specifying the port. if you pass --hpx:hpx=$NODE1:7911 for the second locality (and keep everything else unchanged), it seems to work. — Thomas Heller, Feb 19 '16 at 06:54
These are the commands I was using: $APP --hpx:localities=2 --hpx:agas=$NODE0:7910 --hpx:hpx=$NODE0:7910 & ssh $NODE1 "$APP --hpx:localities=2 --hpx:agas=$NODE0:7910 --hpx:hpx=$NODE1:7911 --hpx:worker" — Thomas Heller, Feb 19 '16 at 06:56
@hkaiser: no, the difference is in the port number used for locality 1. As John pointed out, it seems to work when using IP addresses instead of host names — Thomas Heller, Feb 19 '16 at 15:54

biddisco · Answer 2 · 2016-02-19T07:37:12.373

I am unable to make a comment on an existing answer, so I shall repeat some information from the answer of @hkaiser : on the console/master node or what we would normally think of as rank0 you should use a command of the form

`bin/hello_world -l2 --hpx:agas=xx.xx.xx.AA:7910 --hpx:hpx=xx.xx.xx.AA:7910 `

and on the worker node you should use

`bin/hello_world --hpx:agas=xx.xx.xx.AA:7910 --hpx:hpx=xx.xx.xx.BB:7910 --hpx:worker`

But it is important that the ip address that you use is the one returned by the external network of the nodes and not an internal network (in the case of multiple NIC/IP addresses). To be sure I get the right address, I usually run the command

ip route get 8.8.8.8 | awk 'NR==1 {print $NF}'

on each node and use the output from that when testing.

Note that this IP address specification is only necessary when you are launching jobs by hand not using mpirun or srun to launch the jobs as those commands will spawn the jobs on the nodes allocated by the batch system and the communication will be correctly handled by the HPX internals. When using a batch system, but launching jobs by hand anyway (from within an interactive shell for example, you will find that adding the option --hpx:ignore-batch-env to your command line will help stop HPX from picking up unwanted params.

I tried with git commit 0c3174572ef5d2c from the HPX repo this morning and my result looks as follows

Master Node

bin/hello_world --hpx:agas=148.187.68.38:7910 --hpx:hpx=148.187.68.38:7910 -l2 --hpx:threads=4 hello world from OS-thread 3 on locality 1 hello world from OS-thread 6 on locality 1 hello world from OS-thread 2 on locality 1 hello world from OS-thread 7 on locality 1 hello world from OS-thread 5 on locality 1 hello world from OS-thread 0 on locality 1 hello world from OS-thread 4 on locality 1 hello world from OS-thread 1 on locality 1 hello world from OS-thread 0 on locality 0 hello world from OS-thread 2 on locality 0 hello world from OS-thread 1 on locality 0 hello world from OS-thread 3 on locality 0

Worker Node

bin/hello_world --hpx:agas=148.187.68.38:7910 --hpx:hpx=148.187.68.36:7910 --hpx:worker --hpx:threads=8

Note that it is ok to use different numbers of threads on different nodes as I have done here (but usually the nodes are homogeneous so you use the same number of threads).

Parcelport

if you have compiled with support for MPI (for example) and you want to be sure that the TCP parcelport is used, then add

-Ihpx.parcel.tcp.enable=1 -Ihpx.parcel.mpi.enable=0

to your command line (on all nodes) to make HPX selects the TCP parcelport.

HPX minimal two node example set-up?

2 Answers2

Master Node

Worker Node

Parcelport