Performance comparison issue between OpenMPI and Intel MPI

Question

I am working with a C++ MPI code which when compiled with openMPI takes 1min12 seconds and 16 seconds with Intel MPI (I have tested it on other inputs too, difference is similar. Both compiled codes give correct answer). I want to understand why is there such a big difference in run time. And what can be done to decrease run time with openMPI (GCC).

I am using CentOS 6 OS with Intel Haswell processor. I am using following flags for compiling.

openMPI (GCC): mpiCC -Wall -O3

I have also tried -march=native -funroll-loops. It does not make a great difference. I have also tried -lm option. I cannot compile for 32 bit.

Intel MPI: mpiicpc -Wall -O3 -xhost

-xhost saves 3 seconds in run time.

I think your question is way too broad to answer directly. Which types of MPI operations are you using? Send/Receive? Synchronizations? Collective operations (Broadcast/Reduction? Gather/Scatter? Alltoall?) Are you using immediate or blocking operations? Can you maybe give us some more insight into what you are computing? — Tobias Ribizel, Dec 26 '17 at 11:33
I am using MPI_Send, MPI_Recv, MPI_Isend, MPI_Irecv. I am not using any Collective operations. The code performs molecular docking of small molecules to proteins. [link](https://en.wikipedia.org/wiki/Docking_(molecular)) — Abhi, Dec 26 '17 at 11:49
So am I correct in assuming the majority of the runtime is spent by communication? — Tobias Ribizel, Dec 26 '17 at 11:55
No. Most of the time is spent in calculating coordinates of small molecule being and solving a modified formula of force field equation. You can learn more on the equation using following link [link] (https://www.neutron-sciences.org/articles/sfn/pdf/2011/01/sfn201112009.pdf) Please refer equation 2.1 of the document. — Abhi, Dec 26 '17 at 12:03
Still Intel compiler is performing 5 times better than OpenMPI for the same code. I don't think open MPI is that bad at optimizing. — Abhi, Dec 26 '17 at 12:08
While I cannot really give you a complete answer, some pointers to what might be the source: As always, try to measure every assumption you make. Use MPI_Wtime to check which parts of your program take how much time and how this differs between ICC and GCC. If this difference is mostly in the MPI communication operations, then your suspicions were indeed correct. Maybe try to use Collectives (alltoall) instead of individual sends (see https://stackoverflow.com/questions/47977766/performance-comparison-issue-between-openmpi-and-intel-mpi) — Tobias Ribizel, Dec 26 '17 at 12:13
If the actual computations take most of the time, then this might be more of an optimization issue in GCC vs ICC, while if the communication operations take most of the time, the Intel MPI library might just be more optimized for your communication pattern. Unfortunately, my knowledge in computational chemistry is limited, so I'm relying mostly on educated guesses concerning the amount of work and data exchange is necessary. Maybe we should move this discussion to a chat as soon as SO recommends it ;) — Tobias Ribizel, Dec 26 '17 at 12:20
I am new here, I don't have enough reputation points to move this to chat. We can chat at md.scfbio@gmail.com. If it suits you. Please send me a mail if you are on with it. — Abhi, Dec 26 '17 at 12:25
you are comparing Open MPI + gcc vs Intel MPI + icc, so basically you cannot tell whether Open MPI or gcc (or both) should be blamed for the bad performance. try using Intel MPI with `mpicc` so you can get performance numbers with Intel MPI + gcc, and then make a fair comparison between Intel MPI and Open MPI. fwiw, if you run an old version of Open MPI and use `MPI_THREAD_MULTIPLE`, you will use IPoIB instead of native infiniband, and that issue is fixed in recent Open MPI. — Gilles Gouaillardet, Dec 26 '17 at 13:41
I am using Open MPI: 1.6. I think MPI_THREAD_MULTIPLE support is not there as ompi_info says: Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no). I will try Intel MPI + gcc (To be frank i don't know how to do it. But I will try). I will try to get a new machine and install latest openMPI to see if there is any change after using MPI_THREAD_MULTIPLE or latest openMPI. — Abhi, Dec 26 '17 at 14:07
in Open MPI 1.6, `MPI_THREAD_MULTIPLE` is only available when configure'd with `--enable-mpi-thread-multiple`. with Intel MPI, `mpiicc`, `mpiicpc` and `mpiifort` use Intel compilers, and `mpicc`, `mpicxx` and `mpifort` use GNU compilers. so all you need is to update your makefile to use the wrappers for GNU and rebuild your code. — Gilles Gouaillardet, Dec 27 '17 at 03:30
I am working on a server so i won't get root privileges to get this done. I will get a machine and set it up with the required codes. I will test this and get back. It might take a few days to get a separate test machine for this purpose. But thanks anyways. Now i know how i could test this. — Abhi, Dec 27 '17 at 06:39
i'd rather suggest you try Intel MPI with GNU compilers first, and you do not need root privileges for that. as a side note, Open MPI does not require root privileges to be installed in a user directory. — Gilles Gouaillardet, Dec 27 '17 at 07:25

Performance comparison issue between OpenMPI and Intel MPI

0 Answers0