Questions tagged [cpu-cache]

A CPU-cache is a hardware structure used by the CPU to reduce the average access memory time.

Caching is beneficial once some data elements get re-used.

Caching is a general policy
aimed at eliminating latency,
already paid for next, repetitive re-accessing some already visited
but otherwise "expensive" ( _{read slow} ) resource ( _storage )

Caching does not speed-up memory access.

The maximum a professional programmer can achieve is to pay attention and excercise a due care to allow some latency-masking in a concurrent-mode of code execution, with a carefully issued instructions well beforehand before a forthcoming memory data is indeed consumed, so that the cache-management can release a LRU part and pre-fetch the requested data from slow DRAM.

How it works?

Main memory is usually built with DRAM technology, that allows for big, dense and cheap storage structures. But DRAM access is much slower than the cycle time of a modern CPU (the so called memory wall). A CPU-cache is a smaller memory, usually built with SRAM technology (expensive, but fast) that reduces the amount of accesses to main memory by storing the main memory contents that are likely to be referenced in the near future. Caches exploit a property of programs: the principle of locality, which means adjacent memory addresses are likely to be referenced close in time (spatial locality), and if an address is referenced once, it is more likely to be referenced again soon (temporal locality).

The CPU cache is tagged with an address which are extra SRAM cells. These tag cells indicate the specific address that holds the data. The CPU cache can never mirror the entire system memory so this address must be stored. The index in the array forms a set. The index and the tag can use either physical or virtual (MMU) addresses; leading to the three types PIPT, VIVT, VIPT.

Modern CPUs contain multiple levels of cache. In SMP situations a CPU cache level may be private to a single CPU, a cluster of CPUs or the whole system. Because caching can result in multiple copies of data being present in an SMP system, cache coherence protocols are used to keep data consistent. The VIVT and VIPT type caches can also result in interactions with the MMU (and its cache commonly called a TLB).

Questions regarding CPU cache inconsistencies, profiling or under-utilization are on-topic.

For more information see Wikipedia's CPU-cache article.

Also: tlb, mmu

1011 questions

votes

0 answers

Is it possible to use Linux Perf profiler inside C++ code?

I would like to measure L1, L2 and L3 Cache hit/miss ratio of some parts of my C++ code. I am not interested to use Perf for my entire application. Can Perf be used as a library inside C++? int main() { ... ... start_profiling() //…

c++ linux performancecounter cpu-cache perf

asked May 18 '15 at 01:38

narengi

1,345
3
17
38

votes

5 answers

Are Lisp lists always implemented as linked lists under the hood?

Are Lisp lists always implemented as linked lists under the hood? Is this a problem as far as processor caching goes? If so, are there solutions that use more contiguous structures which help caching?

linked-list lisp cpu-cache

asked May 17 '15 at 04:05

Sam Washburn

1,817
3
25
43

votes

1 answer

Why cache read miss is faster than write miss?

I need to calculate an array (writeArray) using another array (readArray) but the problem is the index mapping is not the same between arrays (Value at index x of writeArray must be calculated with value at index y of readArray) so it's not very…

c++ performance caching cpu-cache

asked Mar 12 '15 at 14:39

Johnmph

3,391
24
32

votes

1 answer

Do bank conflicts occur on non-GPU hardware?

This blog post explains how memory bank conflicts kill the transpose function's performance. Now I can't but wonder: does the same happen on a "normal" cpu (in a multithreaded context)? Or is this specific to CUDA/OpenCL? Or does it not even appear…

c opencl cpu-cache bank-conflict

asked Jun 19 '14 at 14:09

rubenvb

74,642
33
187
332

votes

2 answers

Cache size estimation on your system?

I got this program from this link (https://gist.github.com/jiewmeng/3787223).I have been searching the web with the idea of gaining a better understanding of processor caches (L1 and L2).I want to be able to write a program that would enable me to…

c performance caching cpu-cache

asked Jan 23 '14 at 04:37

liv2hak

14,472
53
157
270

votes

4 answers

Write a program to get CPU cache sizes and levels

I want to write a program to get my cache size(L1, L2, L3). I know the general idea of it. Allocate a big array Access part of it of different size each time. So I wrote a little program. Here's my code: #include #include…

c++ performance cpu-architecture cpu-cache

asked Oct 02 '13 at 12:25

Kan Liu

votes

1 answer

Lock-free check for modification of a global shared state in C using Cache-Line alignment

Edit: ST does not allow to post more than two links for newbies. Sorry for the missing references. I'm trying to reduce locking overhead in a C application where detecting changes on a global state is performance relevant. Even though I've been…

c memory memory-management cpu-cache false-sharing

asked Sep 25 '12 at 23:04

instilled

votes

1 answer

What does a 'Split' cache means. And how is it useful(if it is)?

I was doing a question on Computer Architecture and in it it was mentioned that the cache is a split cache, and no hazard what does this exactly means?

cpu-architecture cpu-cache

asked Apr 18 '19 at 19:36

mskanyal

votes

2 answers

Cache-friendly way to collect results from multiple threads

Consider N threads doing some asynchronous tasks with small result value like double or int64_t. So about 8 result values can fit a single CPU cache line. N is equal to the number of CPU cores. On one hand, if I just allocate an array of N items,…

c++ multithreading optimization x86 cpu-cache

asked Sep 12 '17 at 10:25

Serge Rogatch

13,865
7
86
158

votes

2 answers

Look Through vs Look aside

Suppose there are 2 caches L1 and L2 L1 Hit rate of L1=0.8 Access time of l1=2ns and transfer time b/w L1 and CPU is 10ns L2 Hit rate of L2=0.9 Access time of L2 =5ns and transfer time b/w L2 and L1 is 100ns What will be the effective access…

caching memory cpu-architecture cpu-cache

asked Nov 30 '15 at 14:32

Hemanshu Sethi

votes

1 answer

loop tiling. how to choose block size?

I am trying to learn the loop optimization. i found that loop tiling helps in making the array looping faster. i tried with two block of codes given below with and without loop blocking and measure the time taken for both. i did not find significant…

c performance loops optimization cpu-cache

asked Dec 04 '13 at 05:02

Sagar

1,115
2
13
22

votes

3 answers

How do non temporal instructions work?

I'm reading What Every Programmer Should Know About Memory pdf by Ulrich Drepper. At the beginning of part 6 theres's a code fragment: #include void setbytes(char *p, int c) { __m128i i = _mm_set_epi8(c, c, c, c, c, c, c, c, …

memory x86 cpu-architecture intrinsics cpu-cache

asked Dec 31 '12 at 22:16

Pawel Batko

votes

2 answers

WBINVD instruction usage

I'm trying to use the WBINV instruction on linux to clear the processor's L1 cache. The following program compiles, but produces a segmentation fault when I try to run it. int main() {asm ("wbinvd"); return 1;} I'm using gcc 4.4.3 and run Linux…

c caching assembly x86 cpu-cache

asked Jul 19 '11 at 10:23

roelf

votes

1 answer

How to explicitly load a structure into L1d cache? Weird results with INVD with CR0.CD = 1 on isolated core with/without hyperthreading

My goal is to load a static structure into the L1D cache. After that performing some operation using those structure members and after done with the operation run invd to discard all the modified cache lines. So basically I want to use create a…

c linux-kernel x86 cpu-architecture cpu-cache

asked Mar 23 '21 at 23:20

user45698746

Prev 1 2 3

…

67 68 Next