0

I am a novice in the field of high performance computing and I am learning the Allreduce operation for GPU cards. I find the efficient collective operation called ring-Allreduce which requires the physical topology of GPU cards to be the tree topology. However, I check the topology of my own server as the following.

enter image description here

It seems that the GPU cards are connected by several local PCIe bus and PCIe host bridge. Is it the hierarchy of bus topology?

Sean
  • 901
  • 2
  • 11
  • 30

1 Answers1

0

A two-socket system has several PCIe root ports on each socket. A PCIe bridge is attached to each root port, and the GPUs are attached to the bridges.

Connections labeled PIX are between GPUs attached to the same bridge.

Connections labeled NODE are between GPUs attached to two different bridges (on two different root ports).

Connections labeled SYS are connected to root ports on different sockets.

prl
  • 11,716
  • 2
  • 13
  • 31
  • Thank you for your explanation first. I think I know the physical layout of GPU cards. But I am still confused about the topology. Is it a bus topology? – Sean Feb 13 '20 at 07:40
  • I’m not sure what you mean by that. PCIe links are all point-to-point links in a tree structure, not a shared bus. That’s why bridges are needed on every root port. Does that help answer your question? – prl Feb 13 '20 at 10:33
  • I am not sure if you are familiar with the distributed deep learning training. In such case, we need to collect the gradient information from each gpu card, compute the average and then send back. There is a famous algorithm called ring all-reduce which can achieve the optimal bandwidth based on the tree topology. I just do not know if the default topology of gpu cards is tree topology. – Sean Feb 13 '20 at 11:16