How does Linux handle Intel's Optane Persistent Memory Modules under Memory Mode?

Question

I was wondering whether the Linux kernel did anything special or performed any optimizations when the underlying system employs Persistent Memory Modules in Memory Mode (Near-Memory DRAM cache and NVRAM as main memory). I've tried looking in drivers/nvdimm but it seems that everything here is centered around use in App Direct mode where you MMAP in a DAX file, but in Memory Mode it's semantically and syntactically no different than using DRAM.

Does Linux employ any optimizations, or is everything handled in the hardware? Can someone link me to where any memory mode optimizations are performed in the Linux kernel? Thanks in advance!

score 3 · Accepted Answer · edited Feb 21 '21 at 17:13

Upstream Linux v5.2-rc1 introduced the kernel parameter page_alloc.shuffle, which is a Boolean flag that is automatically enabled if both of the following conditions are true:

It's not manually disabled by adding page_alloc.shuffle=0 to the kernel parameter list.
The kernel is running on a system with firmware that supports ACPI 6.2 and the firmware has communicated to the kernel through the Heterogeneous Memory Attribute Table (HMAT) that the system has a memory-side cache in at least one of the memory domains.

When this parameter is enabled, the kernel page allocator randomizes its free lists in the hope of reducing conflicts on the memory-side cache.

Examples of systems on which it's automatically enabled include KNL/KNM with MCDRAM that is partially or fully configured to run in Cache Mode and CSX/CPX with persistent memory that is partially or fully configured to run in Memory Mode. On all of these systems, there is a direct-mapped memory-side cache, although many implementation details are different.

Free list shuffling provides sustainable good performance, but not necessarily optimal or close to optimal. This is in contrast to running at high performance at first due to good memory-side cache utilization, but then the performance degrades over time due to increasing cache conflicts.

That said, I don't think anyone has tested the impact of free list shuffling on performance on a system with persistent memory running in Memory Mode, even though it's automatically enabled.

There are currently no other potential optimizations for Memory Mode accepted in the kernel.

I'll add some more time for others to either add more and to verify a few things myself before selecting this as the answer if that is okay. Please give me a day or two. — Louis Jenkins, Feb 21 '21 at 17:16
I have not used the Linux page_alloc.shuffle option (and I don't think that randomization is good enough for direct-mapped caches) but I did test the performance impact of several variations on the "zonesort" module (that Intel provided for KNL). It was easy to get modest improvements in performance for sizes that should have been cacheable, but I did not get enough improvement to justify deployment. Modifying the "fakenuma" kernel option to set up a "node" of physical memory that is guaranteed not to conflict looks like a better approach. — John D McCalpin, Feb 22 '21 at 17:06
Thank you for the pointers, I'll be looking forward to measuring performance of `page_alloc.shuffle=0` and also see if I can find a way to use zonesort for my own experiments. — Louis Jenkins, Feb 24 '21 at 16:10

How does Linux handle Intel's Optane Persistent Memory Modules under Memory Mode?

1 Answers1