Questions tagged [numa]

NUMA stands for Non Uniform Memory Access. It is a general linux term indicating that the hardware has multiple memory nodes, and that not all processing units have equal access to all memory.

NUMA stands for Non Uniform Memory Access. It is a general linux term indicating that the hardware has multiple memory nodes, and that not all processing units have equal access to all memory.

As processors become faster and faster, proximity to memory increases in importance for overall computing performance. NUMA systems address this problem by building closer connections between specific computing resources and memory.

307 questions
4
votes
1 answer

How many NUMA nodes on a Power8 processor

I am using Ubuntu 15.04 on a two sockets Power8 machine, each socket has 10 cores. "numactl -H" outputs: available: 4 nodes (0-3) node 0 cpus: 0 8 16 24 32 node 0 size: 30359 MB node 0 free: 26501 MB node 1 cpus: 40 48 56 64 72 node 1 size: 0…
4
votes
2 answers

Java uses only 1 of 2 CPU with NUMA (Neo4J)

I’m working on a java program to create a really large Neo4J database. I use the batchinserter and Executors.newFixedThreadPool to speed things up. My Win2012R2 server has 2 cpu’s (2x6 Cores + 2x6 Hyper-threads) and 256GB in NUMA architecture. My…
Escalus
  • 41
  • 3
4
votes
1 answer

Is there any example of using mbind in C/C++ code?

I am trying to use mbind() in my C++ code in order to rearrange virtual pages across 4 NUMA domains, unfortunately I am new to this function: long mbind(void *addr, unsigned long len, int mode, const unsigned long *nodemask, …
user4530988
  • 63
  • 1
  • 7
4
votes
1 answer

Count read and write accesses to memory

On a Linux machine, I need to count the number of read and write accesses to memory (DRAM) performed by a process. The machine has a NUMA configuration and I am binding the process to access memory from a single remote NUMA node using numactl. The…
user230023
  • 43
  • 5
4
votes
0 answers

Does the Cache Coherency issue apply to UMA architectures as well?

I have learned that Shared Memory computer architectures can be divided in Uniform Memory Access (UMA) and Non-uniform Memory Access (NUMA), depending on whether the access times to a given memory location are the same for all processors or…
sp00n
  • 181
  • 1
  • 11
4
votes
0 answers

Java process reports "incorrect" number of available processors

I'm running a Java 1.6 process on an 8-node NUMA machine using: numactl --cpunodebind=0 java -server com.foo.Bar Each node has 8 CPUs as reported by numactl --hardware: available: 8 nodes (0-7) node 0 cpus: 1 2 3 4 5 6 7 8 node 0 size: ... node 0…
user191776
4
votes
1 answer

How to allocate parts of an array on different nodes on NUMA machines?

I have a NUMA machine of 2 nodes. I want to allocate the two halves of an array on the two nodes respectively. How can I do that? Please note that by "half" I mean a continuous chunk of virtual memory. I found the function numa_alloc_interleaved,…
dalibocai
  • 2,289
  • 5
  • 29
  • 45
4
votes
1 answer

Structure of the Haskell runtime on multicore processors

I understand that the Haskell runtime creates an OS thread on every core or so. Lightweight threads / user threads are then scheduled by the runtime onto these pre-deployed OS threads. Roughly. But how is the Haskell runtime structured - is it…
J Fritsch
  • 3,338
  • 1
  • 18
  • 40
3
votes
1 answer

What's the difference between "Sub-NUMA Clustering" and "Hemisphere and Quadrant Modes" in Intel CPU?

In the technical overview published by Intel, "Sub-NUMA Clustering" and "Hemisphere and Quadrant Modes" are described separately. But the main difference between them is not clear. In this answer, it says that "Inside quadrant or Hemisphere mode,…
3
votes
2 answers

Fastest way to share data between processors residing on different sockets

I have a dual socket 8 core processor, that is, each processor has 4-cores in it. I haven't seen its specification completely, but I think that a separate memory bank is attached to each processor in a ccNUMA fashion and therefore accessing from…
MetallicPriest
  • 29,191
  • 52
  • 200
  • 356
3
votes
1 answer

Explanation for why effective DRAM bandwidth reduces upon adding CPUs

This question is a spin-off of the one posted here: Measuring bandwidth on a ccNUMA system I've written a micro-benchmark for the memory bandwidth on a ccNUMA system with 2x Intel(R) Xeon(R) Platinum 8168: 24 cores @ 2.70 GHz, L1 cache 32 kB, L2…
3
votes
3 answers

How to count memory accesses to remote NUMA memory nodes?

In a multi-threaded application running on a recent linux Distributed Shared Memory system, is there a straight forward way to count the number of requests per thread to remote (non-local) NUMA memory nodes? I am thinking of using PAPI to count…
nandu
  • 2,563
  • 2
  • 16
  • 14
3
votes
1 answer

Sandy Bridge QPI bandwidth perf event

I'm trying to find the proper raw perf event descriptor to monitor QPI traffic (bandwidth) on Intel Xeon E5-2600 (Sandy Bridge). I've found an event that seems relative here (qpi_data_bandwidth_tx: Number of data flits transmitted . Derived from…
Orion Papadakis
  • 398
  • 1
  • 14
3
votes
1 answer

how to set c++11 thread affinity to NUMA node on Windows?

On Windows, how can I: query how many NUMA nodes the system has set affinity of an std::thread to the CPU cores of a specific NUMA node?
matthias_buehlmann
  • 4,641
  • 6
  • 34
  • 76
3
votes
1 answer

NUMA documentations for x86-64 processor?

I have already looked for NUMA documentations for X86-64 processors, unfortunately I only found optimization documents for NUMA. What I want is: how do I initialize NUMA in a system (this would include getting the system's memory topology and…
prinzrainer
  • 319
  • 1
  • 6
  • 12