I'm new to HPC and I am curious about a point regarding the performance of MPI over Infiniband. For reference, I am using OpenMPI over two different machines connected through IB.
I've coded a very simple benchmark to see how fast I can transfer data over IB using MPI calls. Below you can see the code.
The issue is that when I run this, I get a throughput of ~1.4 gigabytes/s. However, when I use standard ib benchmarks like ib_write_bw, I get nearly 6 GB/s. What might account for this sizable discrepancy? Am I being naive about Gather, or is this just a result of OpenMPI overheads that I can't overcome?
In addition to the code, I am providing a plot to show the results of my simple benchmark.
Thanks in advance!
Code:
#include<iostream>
#include<mpi.h>
#include <stdint.h>
#include <ctime>
using namespace std;
void server(unsigned int size, unsigned int n) {
uint8_t* recv = new uint8_t[size * n];
uint8_t* send = new uint8_t[size];
std::clock_t s = std::clock();
MPI_Gather(send, size, MPI_CHAR, recv, size, MPI_CHAR, 0, MPI_COMM_WORLD);
std::clock_t e = std::clock();
cout<<size<<" "<<(e - s)/double(CLOCKS_PER_SEC)<<endl;
delete [] recv;
delete [] send;
}
void client(unsigned int size, unsigned int n) {
uint8_t* send = new uint8_t[size];
MPI_Gather(send, size, MPI_CHAR, NULL, 0, MPI_CHAR, 0, MPI_COMM_WORLD);
delete [] send;
}
int main(int argc, char **argv) {
int ierr, size, rank;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
cout<<"Rank "<<rank<<" of "<<size<<endl;
unsigned int min = 1, max = (1 << 31), n = 1000;
for (unsigned int i = 1; i < n; i++) {
unsigned int s = i * ((max - min) / n);
if(rank == 0) server(s, size); else client(s, size);
}
MPI_Finalize();
}