1

I don't understand why Intel MPI use DAPL, if native ibverbs are faster than DAPL, OpenMPI use native ibverbs. However, in this benchmark IntelMPI achieves better performance.

http://www.hpcadvisorycouncil.com/pdf/AMBER_Analysis_and_Profiling_Intel_E5_2680.pdf

Brayme Guaman
  • 175
  • 2
  • 12
  • Brayme, why DAPL is slower than native ibverbs? For what hardware and what was source of this fact? DAPL may be default only for some versions of Intel MPI and some hardware (and some [other interfaces may be supported](https://software.intel.com/en-us/get-started-with-mpi-for-linux): psm, hfi, libfabric, scif, ...). Are there more recent benchmarks? What is your task? – osgx May 18 '17 at 22:37
  • I read here http://www.advancedclustering.com/act_kb/mpi-over-infiniband/ and it says Intel MPI use DAPL and is slower than OpenMPI, but in this benchmark of AMBER_Analysis IntelMPI is faster than OpenMPI. I need to understand how MPI works over Infiniband, especially from these 2 libraries, and as it relates to OFED, my thesis is about this, but I can not understand it. – Brayme Guaman May 18 '17 at 22:45
  • The "here" of http://www.advancedclustering.com/act_kb/mpi-over-infiniband/ is outdated. There is no date stated, but it is about older libraries, and it may be wrong for some situations. Yes, if there is `dapl`, Intel MPI will use it. But we need some microbenchmarks (not the complex AMBER) to compare practical latency of messages with different sizes on the same hardware with IntelMPI with DAPL; with OFA (OFED verbs); with OFI; and OpenMPI with different options supported by it. If you need to understand something, try to read real docs/srcs; do tests and only ask specific questions here. – osgx May 18 '17 at 22:50

1 Answers1

1

Intel MPI uses several interfaces to interact with hardware, and DAPL is not default for all cases. OpenMPI will select some interface for current hardware too, it will be not always ibverbs, there is shared memory API for local node interactions and TCP for Ethernet-only hosts.

List for Intel MPI (Linux):

https://software.intel.com/en-us/get-started-with-mpi-for-linux

Getting Started with Intel® MPI Library for Linux* OS. Last updated on August 24, 2015

Support for any combination of the following interconnection fabrics:

  • Shared memory
  • Network fabrics with tag matching capabilities through Tag Matching Interface (TMI), such as Intel® True Scale Fabric, Infiniband*, Myrinet* and other interconnects
  • Native InfiniBand* interface through OFED* verbs provided by Open Fabrics Alliance* (OFA*)
  • OpenFabrics Interface* (OFI*)
  • RDMA-capable network fabrics through DAPL*, such as InfiniBand* and Myrinet* Sockets, for example, TCP/IP over Ethernet*, Gigabit Ethernet*, and other interconnects

Interface to fabric can be selected with I_MPI_FABRICS environment variable: https://software.intel.com/en-us/node/535584

Selecting Fabrics. Last updated on February 22, 2017

Intel® MPI Library enables you to select a communication fabric at runtime without having to recompile your application. By default, it automatically selects the most appropriate fabric based on your software and hardware configuration. This means that in most cases you do not have to bother about manually selecting a fabric.

However, in certain situations specifying a particular communication fabric can boost performance of your application. You can specify fabrics for communications within the node and between the nodes (intra-node and inter-node communications, respectively). The following fabrics are available:

Fabric - Network hardware and software used

  • shm - Shared memory (for intra-node communication only).
  • dapl - Direct Access Programming Library* (DAPL)-capable network fabrics, such as InfiniBand* and iWarp* (through DAPL).
  • tcp - TCP/IP-capable network fabrics, such as Ethernet and InfiniBand* (through IPoIB*).
  • tmi - Tag Matching Interface (TMI)-capable network fabrics, such as Intel® True Scale Fabric, Intel® Omni-Path Architecture and Myrinet* (through TMI).
  • ofa - OpenFabrics Alliance* (OFA)-capable network fabrics, such as InfiniBand* (through OFED* verbs).
  • ofi - OpenFabrics Interfaces* (OFI)-capable network fabrics, such as Intel® True Scale Fabric, Intel® Omni-Path Architecture, InfiniBand* and Ethernet (through OFI API).

For inter-node communication, it uses the first available fabric from the default fabric list. This list is defined automatically for each hardware and software configuration (see I_MPI_FABRICS_LIST for details).

For most configurations, this list is as follows:

dapl,ofa,tcp,tmi,ofi

osgx
  • 90,338
  • 53
  • 357
  • 513
  • I do not understand exactly what you mean by "interface". I'm new in this. – Brayme Guaman May 18 '17 at 23:01
  • Probably, the reasons are historic, check https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/279244 Gergana S. post of 2012: "*Certainly, we're investing time and effort in directly supporting OFED verbs via the `ofa` fabric because we feel it's worth it; it gives us the ability to optimize directly for the OFED software stack and has some nice fringe bandwidth benefits via the multi-rail support. ..if you don't have OFED installed, your other option ..is dapl. ... The one thing we can recommend is: if you do have OFED installed, take advange of it through ofa.*" – osgx May 18 '17 at 23:03
  • I mean software interface to hardware (API) as "interface". DAPL and verbs (OFED, ofa) and ofi (libfabric) are just different APIs to the some hardware. – osgx May 18 '17 at 23:04
  • They have different default list for their Omni-Path Fabric: https://software.intel.com/en-us/node/528821 "`ofi,tmi,dapl,ofa,tcp` - *This is the default value for nodes that have Intel® Omni-Path Fabric or Intel® True Scale Fabric available and do not have any other type of interconnect cards.*" – osgx May 18 '17 at 23:06
  • I am doing a benchmark between IntelMPI and OpenMPI on Infiniband. You can recommend me papers or books, to help me better understand this. And how all this is related. – Brayme Guaman May 18 '17 at 23:16
  • Brayme, no, stackoverflow is not for book recommendations (https://stackoverflow.com/help/on-topic - 4 "*Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.*"). Try ANY book or thesis about infiniband which has some mentions about MPIs, read DOCs of MPI libs; check more recent books. INRIA has some. Then do experiments and read sources. – osgx May 18 '17 at 23:21
  • 1
    Sorry to nitpick, but it is actually Open MPI with a space. Also, on InfiniBand networks, Intel MPI does default to DAPL, which bugs the hell out of me as we used to have many problems with DAPL on our old Linux cluster. – Hristo Iliev May 19 '17 at 07:36
  • But I see in the ![OFED Stack](https://www.openfabrics.org/images/ofs/blockdiag_2010.gif) in the layer User APIs, there is uDAPL, then I don't know if the Intel MPI use always uDAPL or can I choose between uDAPL or OFED* verbs? another question is OFED stack is, does the OFED stack only provide uDAPL to Intel MPI or is there a different stack than uDAPL to Intel MPI? What is the benefit of using uDAPL if there is an extra step? – Brayme Guaman May 22 '17 at 02:15
  • Brayme, what is your network hardware? Check linked intel docs to get info how to change API used by MPI implementation. The picture of https://www.openfabrics.org/images/ofs/blockdiag_2010.gif is old and not precise. With most high-performance APIs (interfaces) library will try to talk directly (kernel bypass) with hardware to send and receive actual data (messages) in most cases to get lower latency. – osgx May 22 '17 at 02:25
  • My network hardware is Infiniband QDR, that picture is the last update from the official web page https://www.openfabrics.org/index.php/overview.html – Brayme Guaman May 22 '17 at 04:13
  • Infiniband QDR from which vendor? What is the chip and where did you get the OFED stack? – osgx May 22 '17 at 14:08
  • This is Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] and I get the OFED stack from the official page of OpenFabrics https://www.openfabrics.org/index.php/overview.html – Brayme Guaman May 22 '17 at 17:15