Questions tagged [cpu-cache]

A CPU-cache is a hardware structure used by the CPU to reduce the average access memory time.

Caching is beneficial once some data elements get re-used.

Caching is a general policy
aimed at eliminating latency,
already paid for next, repetitive re-accessing some already visited
but otherwise "expensive" ( _{read slow} ) resource ( _storage )

Caching does not speed-up memory access.

The maximum a professional programmer can achieve is to pay attention and excercise a due care to allow some latency-masking in a concurrent-mode of code execution, with a carefully issued instructions well beforehand before a forthcoming memory data is indeed consumed, so that the cache-management can release a LRU part and pre-fetch the requested data from slow DRAM.

How it works?

Main memory is usually built with DRAM technology, that allows for big, dense and cheap storage structures. But DRAM access is much slower than the cycle time of a modern CPU (the so called memory wall). A CPU-cache is a smaller memory, usually built with SRAM technology (expensive, but fast) that reduces the amount of accesses to main memory by storing the main memory contents that are likely to be referenced in the near future. Caches exploit a property of programs: the principle of locality, which means adjacent memory addresses are likely to be referenced close in time (spatial locality), and if an address is referenced once, it is more likely to be referenced again soon (temporal locality).

The CPU cache is tagged with an address which are extra SRAM cells. These tag cells indicate the specific address that holds the data. The CPU cache can never mirror the entire system memory so this address must be stored. The index in the array forms a set. The index and the tag can use either physical or virtual (MMU) addresses; leading to the three types PIPT, VIVT, VIPT.

Modern CPUs contain multiple levels of cache. In SMP situations a CPU cache level may be private to a single CPU, a cluster of CPUs or the whole system. Because caching can result in multiple copies of data being present in an SMP system, cache coherence protocols are used to keep data consistent. The VIVT and VIPT type caches can also result in interactions with the MMU (and its cache commonly called a TLB).

Questions regarding CPU cache inconsistencies, profiling or under-utilization are on-topic.

For more information see Wikipedia's CPU-cache article.

Also: tlb, mmu

1011 questions

votes

2 answers

Concept of "block size" in a cache

I am just beginning to learn the concept of Direct mapped and Set Associative Caches. I have some very elementary doubts . Here goes. Supposing addresses are 32 bits long, and i have a 32KB cache with 64Byte block size and 512 frames, how much…

operating-system cpu-cache

asked Nov 12 '11 at 22:13

hektor

1,017
3
14
28

votes

1 answer

How does the CPU cache affect the performance of a C program

I am trying to understand more about how CPU cache affects performance. As a simple test I am summing the values of the first column of a matrix with varying numbers of total columns. // compiled with: gcc -Wall -Wextra -Ofast -march=native…

c performance cpu-cache

asked Apr 10 '22 at 16:04

koipond

votes

2 answers

How to explain poor performance on Xeon processors for a loop with both sequential copy and a scattered store?

I stumbled upon a peculiar performance issue when running the following c++ code on some Intel Xeon processors: // array_a contains permutation of [0, n - 1] // array_b and inverse are initialized arrays for (int i = 0; i < n; ++i) { array_b[i] =…

performance intel cpu-architecture cpu-cache amd-processor

asked Sep 07 '20 at 15:23

Jonas Ellert

votes

2 answers

Why isn't there a data bus which is as wide as the cache line size?

When a cache miss occurs, the CPU fetches a whole cache line from main memory into the cache hierarchy. (typically 64 bytes on x86_64) This is done via a data bus, which is only 8 byte wide on modern 64 bit systems. (since the word size is 8…

caching memory cpu-architecture cpu-cache micro-architecture

asked Aug 27 '16 at 14:10

Mike76

votes

1 answer

Cache-as-Ram (no fill mode) Executable Code

I have read about cache-as-ram mode (no-fill mode) numerous times and am wondering whether number one, can executable code be written and jumped to and if so is the executable code restricted to half of the level one cache (since the cache is really…

x86 cpu-architecture cpu-cache osdev

asked Dec 30 '14 at 03:19

n00ax

votes

2 answers

Optimising Java objects for CPU cache line efficiency

I'm writing a library where: It will need to run on a wide range of different platforms / Java implementations (the common case is likely to be OpenJDK or Oracle Java on Intel 64 bit machines with Windows or Linux) Achieving high performance is a…

java performance optimization cpu-cache

asked Dec 31 '12 at 03:01

mikera

105,238
25
256
415

votes

1 answer

Performance when Generating CPU Cache Misses

I am trying to learn about CPU cache performance in the world of .NET. Specifically I am working through Igor Ostovsky's article about Processor Cache Effects. I have gone through the first three examples in his article and have recorded results…

c# .net arrays performance cpu-cache

asked Jun 18 '11 at 13:11

Jason Moore

3,294
15
18

votes

2 answers

Definition/meaning of Aliasing? (CPU cache architectures)

I'm a little confused by the meaning of "Aliasing" between CPU-cache and Physical address. First I found It's definition on Wikipedia : However, VIVT suffers from aliasing problems, where several different virtual addresses may refer to the same…

caching architecture cpu-architecture cpu-cache

asked May 10 '11 at 08:15

wuxb

2,572
1
21
30

votes

3 answers

What specifically marks an x86 cache line as dirty - any write, or is an explicit change required?

This question is specifically aimed at modern x86-64 cache coherent architectures - I appreciate the answer can be different on other CPUs. If I write to memory, the MESI protocol requires that the cache line is first read into cache, then modified…

x86 x86-64 cpu-architecture cpu-cache memory-bandwidth

asked Nov 21 '17 at 16:04

Tim

votes

1 answer

Why does CLFLUSH exist in x86?

I recently learned about the row hammer attack. In order to perform this attack the programmer needs to flush the complete cache hierarchy of a CPU for a specific number of addresses. My question is: why is CLFLUSH necessary in x86? What are the…

x86 cpu-architecture cpu-cache cache-invalidation persistent-memory

asked Sep 05 '16 at 19:22

Martijn

votes

2 answers

Will a modern processor (like the i7) follow pointers and prefetch their data while iterating over a list of them?

I want to learn how to write better code that takes advantage of the CPU's cache. Working with contiguous memory seems to be the ideal situation. That being said, I'm curious if there are similar improvements that can be made with non-contiguous…

c++ performance caching pointers cpu-cache

asked Mar 02 '13 at 04:44

Jonathan

votes

3 answers

CUDA disable L1 cache only for one variable

Is there any way on CUDA 2.0 devices to disable L1 cache only for one specific variable? I know that one can disable L1 cache at compile time adding the flag -Xptxas -dlcm=cg to nvcc for all memory operations. However, I want to disable cache only…

caching assembly cuda cpu-cache ptx

asked Sep 23 '12 at 14:20

zeus2

votes

1 answer

Should the cache padding size of x86-64 be 128 bytes?

I found a comment from crossbeam. Starting from Intel's Sandy Bridge, spatial prefetcher is now pulling pairs of 64-byte cache lines at a time, so we have to align to 128 bytes rather than…

c++ rust x86-64 cpu-cache false-sharing

asked May 05 '22 at 11:44

QuarticCat

1,314
6
20

votes

2 answers

In which condition DCU prefetcher start prefetching?

I am reading about different prefetcher available in Intel Core i7 system. I have performed experiments to understand when these prefetchers are invoked. These are my findings L1 IP prefetchers starts prefetching after 3 cache misses. It…

x86 intel cpu-architecture cpu-cache prefetch

asked Nov 28 '18 at 10:47

bholanath

1,699
1
22
40

votes

2 answers

Is there a way to flush the entire CPU cache related to a program?

On x86-64 platforms, the CLFLUSH assembly instruction allows to flush the cache line corresponding to a given address. Instead of flushing the cache related to a specific address, would there be a way to flush the entire cache (either the cache…

c++ assembly memory optimization cpu-cache

asked Jan 30 '18 at 17:24

Vincent

57,703
61
205
388

Prev 1 2 3

…

67 68 Next