I am a novice in the field of high performance computing and I am learning the Allreduce
operation for GPU cards. I find the efficient collective operation called ring-Allreduce
which requires the physical topology of GPU cards to be the tree topology. However, I check the topology of my own server as the following.
It seems that the GPU cards are connected by several local PCIe bus and PCIe host bridge. Is it the hierarchy of bus topology?