This is obviously a late answer but I ran across this question. I specialize in doing a lot of this at work and unfortunately a complete answer on doing NUMA testing is long and nuanced. Here are some general things to consider:
- It's not impossible, but it is a bit unlikely on a modern, NUMA-aware, operating system that the OS is simply assigning processes to the incorrect NUMA node. You usually see that problem most when a PCIe device like a GPU or NVMe drive is involve, connected over the motherboard to one proc, but the process is running on the opposite. If you think it is a problem you can check NUMA stat. If you are getting NUMA misses you will typically see high counts (and rising) for other_node or numa_foreign though this does depend on a few things. See this Linux doc for a more in the weeds explanation.
[root@r7525 ~]# numastat
node0 node1 node2 node3
numa_hit 460 460 397706 414740
numa_miss 0 0 0 0
numa_foreign 0 0 0 0
interleave_hit 0 0 10633 10567
local_node 0 0 226751 76898
other_node 460 460 170955 337842
node4 node5 node6 node7
numa_hit 423211 295925 460 460
numa_miss 0 0 0 0
numa_foreign 0 0 0 0
interleave_hit 10645 10559 0 0
local_node 256692 247405 0 0
other_node 166519 48520 460 460
node8 node9 node10 node11
numa_hit 460 460 692597 494990
numa_miss 0 0 0 0
numa_foreign 0 0 0 0
interleave_hit 0 0 10634 10577
local_node 0 0 283516 274249
other_node 460 460 409081 220741
node12 node13 node14 node15
numa_hit 269866 227927 460 460
numa_miss 0 0 0 0
numa_foreign 0 0 0 0
interleave_hit 10622 10565 0 0
local_node 103034 87552 0 0
other_node 166832 140374 460 460
- You can check the NUMA layout with
numactl --hardware
. Note: Be aware that there is a difference between the physical NUMA nodes and what you will see in the OS. Ex: The R7525, with NUMAs per socket set to 4, has 8 physical NUMA nodes. Potentially a few more if you enable L3 cache as NUMA. However, what you will see in the OS is this:
[root@r7525 ~]# numactl --hardware
...SNIP...
node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0: 10 11 12 12 12 12 12 12 32 32 32 32 32 32 32 32
1: 11 10 12 12 12 12 12 12 32 32 32 32 32 32 32 32
2: 12 12 10 11 12 12 12 12 32 32 32 32 32 32 32 32
3: 12 12 11 10 12 12 12 12 32 32 32 32 32 32 32 32
4: 12 12 12 12 10 11 12 12 32 32 32 32 32 32 32 32
5: 12 12 12 12 11 10 12 12 32 32 32 32 32 32 32 32
6: 12 12 12 12 12 12 10 11 32 32 32 32 32 32 32 32
7: 12 12 12 12 12 12 11 10 32 32 32 32 32 32 32 32
8: 32 32 32 32 32 32 32 32 10 11 12 12 12 12 12 12
9: 32 32 32 32 32 32 32 32 11 10 12 12 12 12 12 12
10: 32 32 32 32 32 32 32 32 12 12 10 11 12 12 12 12
11: 32 32 32 32 32 32 32 32 12 12 11 10 12 12 12 12
12: 32 32 32 32 32 32 32 32 12 12 12 12 10 11 12 12
13: 32 32 32 32 32 32 32 32 12 12 12 12 11 10 12 12
14: 32 32 32 32 32 32 32 32 12 12 12 12 12 12 10 11
15: 32 32 32 32 32 32 32 32 12 12 12 12 12 12 11 10
This is because each physical processor is also running simultaneous multithreading (SMT) and subsequently presents two separate logical processors per physical processor and with it two NUMA nodes. There is a good script here which shows which two sibling threads:
for core in {0..63}; do
echo -en "$core\t"
cat /sys/devices/system/cpu/cpu$core/topology/thread_siblings_list
done
The R7525 has 64 physical cores so you see the following:
0 0,64
1 1,65
2 2,66
3 3,67
4 4,68
5 5,69
6 6,70
7 7,71
8 8,72
...SNIP...
59 59,123
60 60,124
61 61,125
62 62,126
63 63,127
- If you do have a PCIe card in the mix, you can check the PCIe card's NUMA alignment with `lstopo -v | grep -Ei 'pci|sd|numa'
...SNIP...
PCIBridge L#1 (busid=0000:60:03.1 id=1022:1483 class=0604(PCIBridge) link=7.88GB/s buses=0000:[63-63])
PCI L#0 (busid=0000:63:00.0 id=14e4:16d6 class=0200(Ethernet) link=7.88GB/s)
PCI L#1 (busid=0000:63:00.1 id=14e4:16d6 class=0200(Ethernet) link=7.88GB/s)
PCIBridge L#2 (busid=0000:60:05.2 id=1022:1483 class=0604(PCIBridge) link=0.50GB/s buses=0000:[61-62])
PCIBridge L#3 (busid=0000:61:00.0 id=1556:be00 class=0604(PCIBridge) link=0.50GB/s buses=0000:[62-62])
PCI L#2 (busid=0000:62:00.0 id=102b:0536 class=0300(VGA))
NUMANode L#0 (P#2 local=65175752KB total=65175752KB)
NUMANode L#1 (P#3 local=66057292KB total=66057292KB)
NUMANode L#2 (P#4 local=66058316KB total=66058316KB)
NUMANode L#3 (P#5 local=66045004KB total=66045004KB)
PCIBridge L#5 (busid=0000:00:01.1 id=1022:1483 class=0604(PCIBridge) link=15.75GB/s buses=0000:[01-01])
PCI L#3 (busid=0000:01:00.0 id=1000:10e2 class=0104(RAID) link=15.75GB/s PCISlot=0-1)
PCIBridge L#6 (busid=0000:00:01.2 id=1022:1483 class=0604(PCIBridge) link=1.00GB/s buses=0000:[02-02])
PCI L#4 (busid=0000:02:00.0 id=1b4b:9230 class=0106(SATA) link=1.00GB/s PCISlot=0-2)
Block(Disk) L#2 (Size=234431064 SectorSize=512 LinuxDeviceID=8:16 Model=MTFDDAV240TDU Revision=J004 SerialNumber=2151338FC1AF) "sdb"
Block(Disk) L#3 (Size=234431064 SectorSize=512 LinuxDeviceID=8:0 Model=MTFDDAV240TDU Revision=J004 SerialNumber=2151338FC427) "sda"
PCIBridge L#8 (busid=0000:e0:05.1 id=1022:1483 class=0604(PCIBridge) link=0.50GB/s buses=0000:[e1-e1])
PCI L#5 (busid=0000:e1:00.0 id=14e4:165f class=0200(Ethernet) link=0.50GB/s)
PCI L#6 (busid=0000:e1:00.1 id=14e4:165f class=0200(Ethernet) link=0.50GB/s)
NUMANode L#4 (P#10 local=66058316KB total=66058316KB)
PCIBridge L#10 (busid=0000:c0:01.1 id=1022:1483 class=0604(PCIBridge) link=15.75GB/s buses=0000:[c1-c1])
PCI L#7 (busid=0000:c1:00.0 id=1000:10e2 class=0104(RAID) link=15.75GB/s PCISlot=0-4)
Block(Disk) L#6 (Size=1875374424 SectorSize=512 LinuxDeviceID=8:48 Vendor=NVMe Model=Dell_Ent_NVMe_v2 Revision=.2.0 SerialNumber=36435330529024130025384100000002) "sdd"
Block(Disk) L#7 (Size=6250037248 SectorSize=512 LinuxDeviceID=8:64 Vendor=DELL Model=PERC_H755N_Front Revision=5.16 SerialNumber=6f4ee080160bd5002ab7652100a1691a) "sde"
Block(Disk) L#8 (Size=1875374424 SectorSize=512 LinuxDeviceID=8:32 Vendor=NVMe Model=Dell_Ent_NVMe_v2 Revision=.2.0 SerialNumber=36435330529024120025384100000002) "sdc"
PCIBridge L#11 (busid=0000:c0:08.3 id=1022:1484 class=0604(PCIBridge) link=31.51GB/s buses=0000:[c4-c4])
PCI L#8 (busid=0000:c4:00.0 id=1022:7901 class=0106(SATA) link=31.51GB/s)
NUMANode L#5 (P#11 local=66057292KB total=66057292KB)
NUMANode L#6 (P#12 local=66058316KB total=66058316KB)
NUMANode L#7 (P#13 local=66040920KB total=66040920KB)
PCIBridge L#13 (busid=0000:80:01.2 id=1022:1483 class=0604(PCIBridge) link=2.00GB/s buses=0000:[81-81])
PCI L#9 (busid=0000:81:00.0 id=10de:1bb1 class=0300(VGA) link=2.00GB/s PCISlot=4)
Special depth -3: 8 NUMANode (type #13)
Special depth -5: 10 PCIDev (type #15)
Special depth -6: 9 OSDev (type #16)
You can also print it to a picture with lstopo --of png > r7525.png

Obligatory legal disclaimer: I work for Dell.