1

I'm designing an application involving multi-node communications using Infiniband (ibv_*). What is the standard way to keep connections between nodes? I'm thinking of O(N^2) connections for all pairs of node as the easiest one, but it's kind of silly and not scalable.

w00d
  • 5,416
  • 12
  • 53
  • 85
  • Will you need RDMA operations between the nodes, or do you plan only to use send and receive operations? – haggai_e Aug 25 '14 at 06:25
  • Also, can you tell in advance what will be the communication pattern between the nodes? Perhaps you could dynamically create RC connections and avoid using O(N^2) connections. – haggai_e Aug 25 '14 at 06:27
  • Yes I need RDMA operation. – w00d Aug 25 '14 at 16:33

1 Answers1

2

The question is kinda simple and short, but the real answer is VERY long...

First of all, be sure that you really need to use ibv_... stuff.

Are you using Infiniband or ROCE?

Next, analyze the expected communication pattern of your application.

You're talking about scalability, which probably means that you have a massively parallel application in mind. Do you really need to invent your own communication layer? Can't you use existing solutions? There's a whole CS field that deals with this kind of problems - HPC (High Performance Computing). Perhaps MPI/UPC/some other library will solve your problem?

If you still need to write your own ibv_... application with lots and lots of machines, then you need to consider:

  • do you need RC or UD connections?
  • if you have the newest Mellanox HCA (Connect-IB) then there's also an option of DC
  • what are the scalability requirements?
  • how sensitive is the application to latency/BW?

To summarize:

  • if you need to have a massively parallel IB verbs application, and you need RC, you'd better open RC connections on-demand
  • if you have to have all the RC connection opened in advance, then there's no other way - O(n^2) connections case in inevitable
  • if it fits your needs, consider using UD
  • check that existing solutions are not what you need
kliteyn
  • 1,917
  • 11
  • 24