Questions tagged [numa]

NUMA stands for Non Uniform Memory Access. It is a general linux term indicating that the hardware has multiple memory nodes, and that not all processing units have equal access to all memory.

As processors become faster and faster, proximity to memory increases in importance for overall computing performance. NUMA systems address this problem by building closer connections between specific computing resources and memory.

307 questions

votes

1 answer

Local CPU may degrade Remote CPU performance on Packet Receiving

I have a server with 2 Intel Xeon CPU E5-2620 (Sandy Bridge) and a 10Gbps 82599 NIC (2 ports), which I used for high-performance computing. From the PCI affinity, I see that the 10G NIC is connected to CPU1. I launched several packet receiving…

cpu-architecture numa high-speed-computing

asked Aug 02 '13 at 03:59

user1597066

votes

1 answer

numactl --physcpubind processor migration

I'm trying to launch my mpi-application (Open MPI 1.4.5) with numactl. Since apparently the load balancing using --cpu-nodebind doesn't distribute my processes in a round-robbin manner among the available nodes I wanted to specifically restrict my…

mpi openmpi numa pinning

asked Jun 05 '13 at 21:57

el_tenedor

votes

1 answer

Why are my Opteron cores running at only 75% capacity each? (25% CPU idle)

We've just taken delivery of a powerful 32-core AMD Opteron server with 128Gb. We have 2 x 6272 CPU's with 16 cores each. We are running a big long-running java task on 30 threads. We have the NUMA optimisations for Linux and java turned on. Our…

java numa

asked Oct 05 '12 at 03:16

Tim Cooper

10,023
5
61
77

votes

0 answers

Reserve memory chunks out of multiple NUMA nodes

This question discusses how to force the linux kernel to exclude some memory from being used(and thus visible to the kernel). with memmap=nn[KMG]$ss[KMG] you can exclude 1 chunk of memory. Is it possible to provide this kernel boot parameter…

c linux linux-kernel linux-device-driver numa

asked Oct 01 '12 at 23:14

Jay D

3,263
4
32
48

votes

1 answer

Windows SetThreadAffinityMask has no effect

I have written a small test program in which I try to use the Windows API call SetThreadAffinityMask to lock the thread to a single NUMA node. I retrieve the CPU bitmask of a node with the GetNumaNodeProcessorMask API call, then pass that bitmask to…

c++ windows winapi numa setthreadaffinitymask

asked Jan 24 '12 at 00:03

ahelwer

1,441
13
29

votes

2 answers

performance issues with parallel MATLAB on a NUMA machine

I'm running memory-intensive parallel computations in MATLAB on a 64-core NUMA machine under Windows 7, 8 cores per socket. I'm using parallel computing toolbox to do that. I've noticed a very strange cpu load pattern: then running say 36 parallel…

matlab numa parallel-processing

asked Jan 13 '12 at 19:33

user679205

votes

1 answer

How to implement interleaved page allocation in a user-mode NUMA-aware memory allocator?

I am building a user-mode NUMA-aware memory allocator for linux. The allocator during its initialization grabs a large chunk of memory, one chunk per NUMA node. After this, memory pages requested by the user are met by giving as many memory pages…

linux malloc numa

asked Dec 21 '11 at 13:11

nandu

2,563
2
16
14

votes

1 answer

NUMA - Local memory

Please bear with me, I've just started digging into this whole CPU thing. The RAM squares shown on the diagram below, what do they refer to? Memory pages? As far as I know, CPUs only have one thing that's related to memory at all - their cache. So…

windows processor numa

asked Oct 21 '11 at 14:25

ebb

9,297
18
72
123

votes

1 answer

How granular can multithreaded memory-write access be?

I've read about how NUMA works and that memory is pulled in from RAM through L2 and L1 caches. And that there are only two ways to share data: read access from n (n>=0) threads read-write access from 1 thread But how granular can the data be for…

multithreading memory numa

asked Oct 07 '22 at 10:47

office-account

votes

0 answers

Internal error when using MPI Intel library with reduction operation on communicators

I am having some issues when using reduction operations on MPI communicators. I have a lots of different communicators created using the algorithm this way : MPI_ERR_SONDAGE(MPI_Group_incl(world_group, comm_size, &(on_going_communicator[0]),…

c++ mpi numa

asked Sep 08 '22 at 17:30

PilouPili

2,601
2
17
31

votes

1 answer

First touch in case of small sized data sharing on Linux

The "first touch" (a special term used to indicate virtual memory mapping in case of NUMA systems) write-operation causes the mapping of memory pages to the NUMA node associated with the thread which first writes to them. Having read this page,…

multithreading linux-kernel openmp shared-memory numa

asked Mar 22 '22 at 11:01

Nitin Malapally

votes

3 answers

Problem of sorting OpenMP threads into NUMA nodes by experiment

I'm attempting to create a std::vector> with one set for each NUMA-node, containing the thread-ids obtained using omp_get_thread_num(). Topo: Idea: Create data which is larger than L3 cache, set first touch using thread 0, perform…

c++ multithreading openmp affinity numa

asked Mar 03 '22 at 16:50

Nitin Malapally

votes

1 answer

How can I realize data local spawning or scheduling of tasks in OpenMP on NUMA CPUs?

I have this simple self-contained example of a very rudimentary 2 dimensional stencil application using OpenMP tasks on dynamic arrays to represent an issue that I am having on a problem that is less of a toy problem. There are 2 update steps in…

c++ task openmp numa

asked Feb 27 '22 at 10:42

user151387

votes

0 answers

C++: how to detect if system is NUMA at runtime?

I want to have a parallel function with different code paths depending on whether the function is being run in a system with an UMA or NUMA architecture, and I wonder how I can detect at runtime if the system is NUMA with more than 1 node. I see…

c++ numa

asked Nov 20 '21 at 21:43

anymous.asker

1,179
9
14

votes

1 answer

Does Seastar framwork in C++ allow users to allocate different sizes of memory in different threads?

I am learning seastar framework recently and one thing that really confuses me. The official tutorial says that memory is allocated averagely in threads(cores), but this might seem very inconvenient. Does Seastar allow users themselves to allocate…

c++ memory memory-management numa scylla

asked May 19 '21 at 08:41

ZHAN LU

Prev 1 2 3

…

20 21 Next