Questions tagged [numa]

NUMA stands for Non Uniform Memory Access. It is a general linux term indicating that the hardware has multiple memory nodes, and that not all processing units have equal access to all memory.

NUMA stands for Non Uniform Memory Access. It is a general linux term indicating that the hardware has multiple memory nodes, and that not all processing units have equal access to all memory.

As processors become faster and faster, proximity to memory increases in importance for overall computing performance. NUMA systems address this problem by building closer connections between specific computing resources and memory.

307 questions
3
votes
1 answer

Local CPU may degrade Remote CPU performance on Packet Receiving

I have a server with 2 Intel Xeon CPU E5-2620 (Sandy Bridge) and a 10Gbps 82599 NIC (2 ports), which I used for high-performance computing. From the PCI affinity, I see that the 10G NIC is connected to CPU1. I launched several packet receiving…
3
votes
1 answer

numactl --physcpubind processor migration

I'm trying to launch my mpi-application (Open MPI 1.4.5) with numactl. Since apparently the load balancing using --cpu-nodebind doesn't distribute my processes in a round-robbin manner among the available nodes I wanted to specifically restrict my…
el_tenedor
  • 644
  • 1
  • 8
  • 19
3
votes
1 answer

Why are my Opteron cores running at only 75% capacity each? (25% CPU idle)

We've just taken delivery of a powerful 32-core AMD Opteron server with 128Gb. We have 2 x 6272 CPU's with 16 cores each. We are running a big long-running java task on 30 threads. We have the NUMA optimisations for Linux and java turned on. Our…
Tim Cooper
  • 10,023
  • 5
  • 61
  • 77
3
votes
0 answers

Reserve memory chunks out of multiple NUMA nodes

This question discusses how to force the linux kernel to exclude some memory from being used(and thus visible to the kernel). with memmap=nn[KMG]$ss[KMG] you can exclude 1 chunk of memory. Is it possible to provide this kernel boot parameter…
Jay D
  • 3,263
  • 4
  • 32
  • 48
2
votes
1 answer

Windows SetThreadAffinityMask has no effect

I have written a small test program in which I try to use the Windows API call SetThreadAffinityMask to lock the thread to a single NUMA node. I retrieve the CPU bitmask of a node with the GetNumaNodeProcessorMask API call, then pass that bitmask to…
ahelwer
  • 1,441
  • 13
  • 29
2
votes
2 answers

performance issues with parallel MATLAB on a NUMA machine

I'm running memory-intensive parallel computations in MATLAB on a 64-core NUMA machine under Windows 7, 8 cores per socket. I'm using parallel computing toolbox to do that. I've noticed a very strange cpu load pattern: then running say 36 parallel…
user679205
  • 51
  • 1
  • 2
2
votes
1 answer

How to implement interleaved page allocation in a user-mode NUMA-aware memory allocator?

I am building a user-mode NUMA-aware memory allocator for linux. The allocator during its initialization grabs a large chunk of memory, one chunk per NUMA node. After this, memory pages requested by the user are met by giving as many memory pages…
nandu
  • 2,563
  • 2
  • 16
  • 14
2
votes
1 answer

NUMA - Local memory

Please bear with me, I've just started digging into this whole CPU thing. The RAM squares shown on the diagram below, what do they refer to? Memory pages? As far as I know, CPUs only have one thing that's related to memory at all - their cache. So…
ebb
  • 9,297
  • 18
  • 72
  • 123
2
votes
1 answer

How granular can multithreaded memory-write access be?

I've read about how NUMA works and that memory is pulled in from RAM through L2 and L1 caches. And that there are only two ways to share data: read access from n (n>=0) threads read-write access from 1 thread But how granular can the data be for…
2
votes
0 answers

Internal error when using MPI Intel library with reduction operation on communicators

I am having some issues when using reduction operations on MPI communicators. I have a lots of different communicators created using the algorithm this way : MPI_ERR_SONDAGE(MPI_Group_incl(world_group, comm_size, &(on_going_communicator[0]),…
PilouPili
  • 2,601
  • 2
  • 17
  • 31
2
votes
1 answer

First touch in case of small sized data sharing on Linux

The "first touch" (a special term used to indicate virtual memory mapping in case of NUMA systems) write-operation causes the mapping of memory pages to the NUMA node associated with the thread which first writes to them. Having read this page,…
2
votes
3 answers

Problem of sorting OpenMP threads into NUMA nodes by experiment

I'm attempting to create a std::vector> with one set for each NUMA-node, containing the thread-ids obtained using omp_get_thread_num(). Topo: Idea: Create data which is larger than L3 cache, set first touch using thread 0, perform…
Nitin Malapally
  • 534
  • 2
  • 10
2
votes
1 answer

How can I realize data local spawning or scheduling of tasks in OpenMP on NUMA CPUs?

I have this simple self-contained example of a very rudimentary 2 dimensional stencil application using OpenMP tasks on dynamic arrays to represent an issue that I am having on a problem that is less of a toy problem. There are 2 update steps in…
user151387
  • 103
  • 7
2
votes
0 answers

C++: how to detect if system is NUMA at runtime?

I want to have a parallel function with different code paths depending on whether the function is being run in a system with an UMA or NUMA architecture, and I wonder how I can detect at runtime if the system is NUMA with more than 1 node. I see…
anymous.asker
  • 1,179
  • 9
  • 14
2
votes
1 answer

Does Seastar framwork in C++ allow users to allocate different sizes of memory in different threads?

I am learning seastar framework recently and one thing that really confuses me. The official tutorial says that memory is allocated averagely in threads(cores), but this might seem very inconvenient. Does Seastar allow users themselves to allocate…
ZHAN LU
  • 51
  • 5