8

I'm trying to get an accurate description of the data cache hierarchy of the current CPU on Linux: not just the size of individual L1/L2/L3 (and possibly L4) data caches, but also the way they are split or shared across cores.

For instance, on my CPU (AMD Ryzen Threadripper 3970X), each core has its own 32 KB of L1 data cache and 512 KB of L2 cache, however the L3 cache is shared across cores within a core complex (CCX). In other words, there are 8 distinct L3 caches, each of 16 MB.

The "Cache" section of this screenshot of CPU-Z on Windows is basically what I'm trying to find out:

CPU-Z Screenshot

I have no problem getting these information on Windows with GetLogicalProcessorInformation().

However, on Linux, it appears that sysconf() only gives me either the per-core cache size for L1 and L2 data caches (_SC_LEVEL1_DCACHE_SIZE and _SC_LEVEL2_DCACHE_SIZE), or the total L3 cache size (_SC_LEVEL3_CACHE_SIZE).

EDIT: lstopo's output under VMWare. The virtual machine has 8 cores. L1 and L2 cache information are fine but L3 cache size does not appear to be correct:

lstopo Screenshot

François Beaune
  • 4,270
  • 7
  • 41
  • 65
  • 1
    This may help... https://askubuntu.com/a/214302 – Mark Setchell Apr 27 '20 at 08:28
  • I had a look at lstopo, this project is awesome but likely overkill for my needs. What I'm really confused about is the mixup between per-core and non-per-core cache sizes returned by `sysconf()`. How to make sense of them if we don't know if caches are shared or not? – François Beaune Apr 27 '20 at 08:38
  • Are you wanting to have your program use this to decide something about how many threads to start, or what CPU affinity mask to set? Or do you want info to show to the user? Either way, you may need to use the x86 `cpuid` instruction yourself on that ISA, and maybe even embed some per-model cache layout details. IDK how much detail the various CPUID leafs like https://sandpile.org/x86/cpuid.htm#level_0000_0004h can represent. – Peter Cordes Apr 27 '20 at 08:50
  • I updated my post with a screenshot of lstopo (under VMware). It doesn't look like it reports cache sizes correctly (again a mixup between shared caches and per-core caches). @PeterCordes It's just to show to the user. – François Beaune Apr 27 '20 at 09:35
  • 1
    Can you try `lstopo` on Linux on bare metal? (e.g. boot a live USB). Your bogus result could be the VM's fault so we should rule that out. Unsurprisingly it works as expected on my i7-6700k desktop, showing all 4 cores in the same package sharing an L3 cache. But Intel Sandybridge-family is the most widely used and not recently changed series of x86 CPUs. – Peter Cordes Apr 27 '20 at 09:51
  • I can't reboot for now (too much stuff currently running on that machine) but I did observe the same issue with `sysconf()` on another system (i7-4702MQ running Fedora) where reported sizes for L1 data cache (32 KB) and L2 cache (256 KB) are correct (and per-core) while the reported L3 cache size (6 MB) is shared. I'll check with lstopo on that machine. – François Beaune Apr 27 '20 at 09:58
  • Because I'm on Windows. – François Beaune Apr 27 '20 at 11:40
  • 1
    [Note that `lstopo` is also available for Windows](https://www.open-mpi.org/software/hwloc/v2.2/). `lstopo` use the `cpuid` instruction (and maybe the ACPI `SRAT` table). `cpuid` is relatively easy to use but Intel and AMD differ a lot in this aspect. `hwloc` (to which `lstopo` belongs) has an API interface that you can use to get the cache topology both on Windows and Linux. – Margaret Bloom Apr 27 '20 at 17:55

1 Answers1

5

A full picture of the cache hierarchy can be found programmatically by opening files in /sys (sysfs).

Each "thread" or "logical processor" is represented by a sub-directory in /sys/devices/system/cpu/. Within that directory you'll find a cache directory. For example, cache information for the first logical processor can be found here:

$ ls /sys/devices/system/cpu/cpu0/cache/
index0
index1
index2
index3
power
uevent

Each cache entity associated with that logical processor is represented by an index[0-9]* directory. The number after index does not represent the level. The same cache entity may be listed multiple times under different logical processors. Within these directories you can find all the properties of the cache entity (level, sets, line size, etc).

$ ls /sys/devices/system/cpu/cpu0/cache/index0
coherency_line_size
level
number_of_sets
physical_line_partition
power
shared_cpu_list
shared_cpu_map
size
type
uevent
ways_of_associativity

Full documentation can be found here.

Most importantly, to get the output you want, you'll need to inspect shared_cpu_list:

$ cat /sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_list
0,28

This will show you what logical processors share this cache entity. By inspecting all entities (/sys/devices/system/cpu/cpu*/cache/index*/), and eliminating duplicates using shared_cpu_list, you can programmatically access all the data you require.

Note that your hypervisor isn't required to pass along accurate information. This will only show you the cache hierarchy as the guest kernel sees it.

Mikel Rychliski
  • 3,455
  • 5
  • 22
  • 29