I tried the following "hello world" code, first on my system(8 cores) then on server(160 cores):
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
double t;
t=MPI_Wtime();
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
//printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
printf("%f---%d---%s\n",MPI_Wtime()-t,rank,processor_name);
sleep(.5);//to make sure, each process needs a significant amount of time to finish
MPI_Finalize();
}
I run the program with 160 processes using mpirun -np 160 ./hello
I expected server run to be more efficient as it have a single core available for each process at the starting point, but the result was opposite.
8 cores : 2.25 sec
160 cores : 5.65 sec
Please correct me, if I am confused regarding the core assignment to each process. Also please explain how the mapping is done by default? I know there are several ways to do it manually, either by using a rankfile or using some options related to socket/core affinity. I want to know how the processes are treated in openMPI and how they are given resources by default?