MPICH: How to publish_name such that a client application can lookup_name it?

Question

While learning MPI using MPICH in windows (1.4.1p1) I found some sample code here. Originally, when I ran the server, I would have to copy the generated port_name and start the client with it. That way, the client can connect to the server. I modified it to include MPI_Publish_name() in the server instead. After launching the server with a name of aaaa, I launch the client which fails MPI_Lookup_name() with

Invalid service name (see MPI_Publish_name), error stack:
MPID_NS_Lookup(87): Lookup failed for service name aaaa

Here are the snipped bits of code:

server.c

MPI_Comm client; 
MPI_Status status; 
char port_name[MPI_MAX_PORT_NAME];
char serv_name[256];
double buf[MAX_DATA]; 
int size, again; 
int res = 0;

MPI_Init( &argc, &argv ); 
MPI_Comm_size(MPI_COMM_WORLD, &size); 
MPI_Open_port(MPI_INFO_NULL, port_name);
sprintf(serv_name, "aaaa");
MPI_Publish_name(serv_name, MPI_INFO_NULL, port_name);

while (1) 
{ 
    MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &client );
    /*...snip...*/
}

client.c

MPI_Comm server; 
double buf[MAX_DATA]; 
char port_name[MPI_MAX_PORT_NAME]; 
memset(port_name,'\0',MPI_MAX_PORT_NAME);
char serv_name[256];
memset(serv_name,'\0',256);

strcpy(serv_name, argv[1] )
MPI_Lookup_name(serv_name, MPI_INFO_NULL, port_name);
MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server ); 
MPI_Send( buf, 0, MPI_DOUBLE, 0, tag, server ); 
MPI_Comm_disconnect( &server ); 
MPI_Finalize(); 
return 0;

I cannot find any information about altering visibility of published names, if that is even the problem. MPICH seems to not have implemented anything with MPI_INFO. I would try openMPI but I am having trouble just building it. Any suggestions?

I have the same problem. I think the communication only works if you start a program on several computers using mpirun. I'll post a new question, maybe we're lucky. — zonksoft, Feb 21 '13 at 16:09
Could you describe what you're trying to accomplish? If you're just learning MPI, I'll note that this is a *very* obscure feature set, that I have literally *never* seen or heard of being used in an application. In other words, it's probably not what you should be spending time and attention on. — Phil Miller, Aug 12 '13 at 16:39
@Novelocrat I wanted to get clients to reliably find the server without user to read off the published name by the server on start up. It's been a while and I have forgotten many things. (To any future readers)I was doing an initial explore of MPI to get a feel of what it can do but I did not have much luck and moved on to other things (ended up using boost asio for my work distribution needs). — Morpork, Aug 13 '13 at 03:12
The point is that when launching an MPI job, the normal pattern is that `mpirun` starts up all of your processes in one go, and they're then part of `MPI_COMM_WORLD`. They can send and receive messages amongst themselves with no further setup on the application's part. Unless you're doing something strange, simply getting a parallel program with some work distribution up and running should be trivial. — Phil Miller, Aug 13 '13 at 15:45
@Novelocrat Correct me if I'm wrong, but does work distribution not mean to a network of other computers? Surely `mpirun` cannot start processes on other computers? My understanding was that for the other computers to join the `WORLD` they need to know the port, which is `PUBLISHED` by the server. However when I called `MPI_Lookup_name` on the clients, they were still unable to locate where the server was. (Maybe I should have just copied the port number and gave it to the clients, and avoided `publish_name/lookup_name` altogether?) — Morpork, Aug 14 '13 at 00:48
You've gotten it wrong, in a way that makes your work much harder. The entire purpose of `mpirun` is to start a parallel program on a network of computers. It sets environment variables for each of them so that they know what other computers are part of the job, and how many of them there are. At the point that they call `MPI_Init`, they all have a complete `MPI_COMM_WORLD`. As I said, I've written a *lot* of MPI programs, on machines ranging from 1 node to thousands, and have *never* used any of the functionality your question and comments touch on, nor seen it in anyone else's code either. — Phil Miller, Aug 14 '13 at 18:32
http://stackoverflow.com/questions/10912793/how-are-mpi-processes-started/10913844 Happens to be an excellent pointer. — Phil Miller, Aug 14 '13 at 18:38
@Novelocrat Ahh, I think I can understand what you mean now. If you write an answer pointing out that it is not necessary to use those functions and should instead rely on the two approaches in the question you linked, I would be happy to accept it as an answer! — Morpork, Aug 15 '13 at 04:27

score 1 · Answer 1 · answered Dec 17 '16 at 04:11

1

I uploaded a working version using OpenMPI 1.6.5 of a client and server in C on Ubuntu that uses the ompi-server name server here:

OpenMPI nameserver client server example in C

answered Dec 17 '16 at 04:11

daemondave

309
2
12

score 1 · Answer 2 · answered Mar 30 '21 at 12:19

(digging up old stuff)
For MPICH, the code by @daemondave should actually work as well. It does, however, still require to get a nameserver running. For MPICH, instead of using ompi-server, this can be done using hydra_nameserver. The host then has to be specified for all the mpirun/mpiexec calls using -nameserver HOSTNAME.

I created a working example over at github, which also provides a shell script to build+run the example.

P.S: the ompi-server variant seems to be somewhat outdated (and includes a few bugs).
For an updated, but still, somewhat undocumented alternative, see this comment.

score -5 · Accepted Answer · edited May 23 '17 at 11:45

-5

This approach of publishing names, looking them up, and connecting to them is outlandish relative to normal MPI usage.

The standard pattern is to use mpirun to specify a set of nodes on which to launch a given number of processes. The operation of common implementations of mpirun implementations is explained in another question

Once the processes are all launched as part of a single parallel job, the MPI library reads whatever information the launcher provided during MPI_Init to set up MPI_COMM_WORLD, a communicator over the group of all processes in the job.

Using that communicator, the parallel application can distribute work, exchange information, and so forth. It would do this using the common MPI_Send and MPI_Recv routines, in all their variants, the collective operations, and so forth.

edited May 23 '17 at 11:45

Community

1
1

answered Aug 15 '13 at 17:28

Phil Miller

36,389
13
67
90

1

The fact that you find this feature set uncommon to use (because you never used it), does not mean that nobody else uses it. The question asks about the concrete problem with name publishing for setting up MPI client/server communication. And this is, in fact, a very common feature especially for code coupling systems which are extensively used, for example, in scientific computing to couple two (many) independent (different executables) solvers. For instance, if fluid/structure interaction to be simulated, the fluid solver and the structure solver are coupled. Sorry, but `-1` for your noise. – Alexander Shukaev Feb 10 '15 at 04:01
How about this scenario: I'm on a cluster with at least two different processor types, and I want to run a client/server or MPDM using both types. However, the cluster admins only allow me to submit jobs on the one type or the other. With the name publishing mechanism I can start the programs separately and then connect them. – Victor Eijkhout Mar 07 '19 at 19:40
@victor: that would be a good use of the feature described in the original question. The op was clearly a new user looking to learn about basic common case operation. – Phil Miller Apr 21 '19 at 21:40

MPICH: How to publish_name such that a client application can lookup_name it?

3 Answers3

Linked