4

My program uses shared memory as a data storage. This data must be available to any application running, and fetching this data must be fast. But some applications can run on different NUMA nodes, and data access for them is realy expensive. Is data duplication for every NUMA node is the only way for doing this?

Evgeny Lazin
  • 9,193
  • 6
  • 47
  • 83
  • 1
    It is very depend on how (in what order) your program accesses data, and how it does write it – osgx Aug 05 '11 at 11:38
  • 1
    Memory access pattern is absolutely unpredictable – Evgeny Lazin Aug 05 '11 at 11:53
  • 1
    So, it is "absolutely unpredictable" how make the program NUMA-ready. You can just start the program as on SMP, and if it has bad memory access pattern, it will run slow. (Numa allow access to other node's memory with higher cost than to local memory) – osgx Aug 05 '11 at 11:56

1 Answers1

5

There are two primary sources of slowdown that can be attributed to NUMA. The first is the increased latency of remote access which can vary depending on the platform. On the platforms that I work with, there is about a 30% hit in latency.

The other source of performance loss can come from contention over the communication links and controllers between NUMA nodes.

The default allocation scheme for Linux is to allocate the data on the node where it was created. If majority of the data in the application is initialized by a single thread then it'll generate a lot of cross NUMA domain traffic and contention for that one memory node.

If your data is read only, then replication is a good solution.

Otherwise, interleaving the data allocation across all your nodes will distribute the requests across all the nodes and will help relieve congestion.

To interleave the data, you can use set_mempolicy() from numaif.h if you are using Linux.

osgx
  • 90,338
  • 53
  • 357
  • 513
Mark
  • 3,177
  • 4
  • 26
  • 37