I don't know exactly what kind of system you are testing on (it looks approximately like output from a DGX-1).
With respect to this question:
Why PHB got rank 0?
If you run the original topologyQuery
sample code, you'll see (at least on DGX-1 like systems) that it does not print out a performance rank for every GPU pair. From what I can see, it does not print out a performance rank for the places where PHB
is indicated. If you study the orginal code, the reason for this is clear: P2P is not supported on those pair combinations. Your code, however, seems to print out a zero in these cases. So I would say that is a defect in your code as compared to the original topologyQuery
code, and it is leading to this question and your misunderstanding. PHB
did not get assigned rank 0 by the original code. But your modified code does that. So that's for you to answer.
Why NV1 got rank 1?
With respect to the remainder, an NV2
connection implies a dual-link NVLink connection between those 2 GPUs (50GB/s per direction). This would be the most performant kind of link (in that particular system), so it is assigned a link value of 0.
An NV1
connection implies a single-link NVLink connection (25GB/s per direction). This would be less performant than NV2
so it is assigned a link performance value of 1. Increasing performance numbers indicate decreasing link performance.
As an aside, if your intent is to do this:
Basically, do pretty much the same nvidia-smi topo -m
does.
You won't be able to do that strictly with CUDA API calls.
For reference, here is the nvidia-smi topo -m
output and ./topologyQuery
output for a DGX-1:
# nvidia-smi topo -m
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity
GPU0 X NV1 NV1 NV2 NV2 PHB PHB PHB 0-79
GPU1 NV1 X NV2 NV1 PHB NV2 PHB PHB 0-79
GPU2 NV1 NV2 X NV2 PHB PHB NV1 PHB 0-79
GPU3 NV2 NV1 NV2 X PHB PHB PHB NV1 0-79
GPU4 NV2 PHB PHB PHB X NV1 NV1 NV2 0-79
GPU5 PHB NV2 PHB PHB NV1 X NV2 NV1 0-79
GPU6 PHB PHB NV1 PHB NV1 NV2 X NV2 0-79
GPU7 PHB PHB PHB NV1 NV2 NV1 NV2 X 0-79
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
PIX = Connection traversing a single PCIe switch
NV# = Connection traversing a bonded set of # NVLinks
# ./topologyQuery
GPU0 <-> GPU1:
* Atomic Supported: yes
* Perf Rank: 1
GPU0 <-> GPU2:
* Atomic Supported: yes
* Perf Rank: 1
GPU0 <-> GPU3:
* Atomic Supported: yes
* Perf Rank: 0
GPU0 <-> GPU4:
* Atomic Supported: yes
* Perf Rank: 0
GPU1 <-> GPU0:
* Atomic Supported: yes
* Perf Rank: 1
GPU1 <-> GPU2:
* Atomic Supported: yes
* Perf Rank: 0
GPU1 <-> GPU3:
* Atomic Supported: yes
* Perf Rank: 1
GPU1 <-> GPU5:
* Atomic Supported: yes
* Perf Rank: 0
GPU2 <-> GPU0:
* Atomic Supported: yes
* Perf Rank: 1
GPU2 <-> GPU1:
* Atomic Supported: yes
* Perf Rank: 0
GPU2 <-> GPU3:
* Atomic Supported: yes
* Perf Rank: 0
GPU2 <-> GPU6:
* Atomic Supported: yes
* Perf Rank: 1
GPU3 <-> GPU0:
* Atomic Supported: yes
* Perf Rank: 0
GPU3 <-> GPU1:
* Atomic Supported: yes
* Perf Rank: 1
GPU3 <-> GPU2:
* Atomic Supported: yes
* Perf Rank: 0
GPU3 <-> GPU7:
* Atomic Supported: yes
* Perf Rank: 1
GPU4 <-> GPU0:
* Atomic Supported: yes
* Perf Rank: 0
GPU4 <-> GPU5:
* Atomic Supported: yes
* Perf Rank: 1
GPU4 <-> GPU6:
* Atomic Supported: yes
* Perf Rank: 1
GPU4 <-> GPU7:
* Atomic Supported: yes
* Perf Rank: 0
GPU5 <-> GPU1:
* Atomic Supported: yes
* Perf Rank: 0
GPU5 <-> GPU4:
* Atomic Supported: yes
* Perf Rank: 1
GPU5 <-> GPU6:
* Atomic Supported: yes
* Perf Rank: 0
GPU5 <-> GPU7:
* Atomic Supported: yes
* Perf Rank: 1
GPU6 <-> GPU2:
* Atomic Supported: yes
* Perf Rank: 1
GPU6 <-> GPU4:
* Atomic Supported: yes
* Perf Rank: 1
GPU6 <-> GPU5:
* Atomic Supported: yes
* Perf Rank: 0
GPU6 <-> GPU7:
* Atomic Supported: yes
* Perf Rank: 0
GPU7 <-> GPU3:
* Atomic Supported: yes
* Perf Rank: 1
GPU7 <-> GPU4:
* Atomic Supported: yes
* Perf Rank: 0
GPU7 <-> GPU5:
* Atomic Supported: yes
* Perf Rank: 1
GPU7 <-> GPU6:
* Atomic Supported: yes
* Perf Rank: 0
GPU0 <-> CPU:
* Atomic Supported: no
GPU1 <-> CPU:
* Atomic Supported: no
GPU2 <-> CPU:
* Atomic Supported: no
GPU3 <-> CPU:
* Atomic Supported: no
GPU4 <-> CPU:
* Atomic Supported: no
GPU5 <-> CPU:
* Atomic Supported: no
GPU6 <-> CPU:
* Atomic Supported: no
GPU7 <-> CPU:
* Atomic Supported: no