Questions tagged [cpu-cache]

A CPU-cache is a hardware structure used by the CPU to reduce the average access memory time.

Caching is beneficial once some data elements get re-used.

Caching is a general policy
aimed at eliminating latency,
already paid for next, repetitive re-accessing some already visited
but otherwise "expensive" ( _{read slow} ) resource ( _storage )

Caching does not speed-up memory access.

The maximum a professional programmer can achieve is to pay attention and excercise a due care to allow some latency-masking in a concurrent-mode of code execution, with a carefully issued instructions well beforehand before a forthcoming memory data is indeed consumed, so that the cache-management can release a LRU part and pre-fetch the requested data from slow DRAM.

How it works?

Main memory is usually built with DRAM technology, that allows for big, dense and cheap storage structures. But DRAM access is much slower than the cycle time of a modern CPU (the so called memory wall). A CPU-cache is a smaller memory, usually built with SRAM technology (expensive, but fast) that reduces the amount of accesses to main memory by storing the main memory contents that are likely to be referenced in the near future. Caches exploit a property of programs: the principle of locality, which means adjacent memory addresses are likely to be referenced close in time (spatial locality), and if an address is referenced once, it is more likely to be referenced again soon (temporal locality).

The CPU cache is tagged with an address which are extra SRAM cells. These tag cells indicate the specific address that holds the data. The CPU cache can never mirror the entire system memory so this address must be stored. The index in the array forms a set. The index and the tag can use either physical or virtual (MMU) addresses; leading to the three types PIPT, VIVT, VIPT.

Modern CPUs contain multiple levels of cache. In SMP situations a CPU cache level may be private to a single CPU, a cluster of CPUs or the whole system. Because caching can result in multiple copies of data being present in an SMP system, cache coherence protocols are used to keep data consistent. The VIVT and VIPT type caches can also result in interactions with the MMU (and its cache commonly called a TLB).

Questions regarding CPU cache inconsistencies, profiling or under-utilization are on-topic.

For more information see Wikipedia's CPU-cache article.

Also: tlb, mmu

1011 questions

votes

5 answers

why are separate icache and dcache needed

Can someone please explain what do we gain by having a separate instruction cache and data cache. Any pointers to a good link explaining this will also be appreciated.

caching x86 cpu-architecture cpu-cache

asked Jan 03 '12 at 01:35

ango

votes

0 answers

On Skylake (SKL) why are there L2 writebacks in a read-only workload that exceeds the L3 size?

Consider the following simple code: #include #include #include #include #include int cpu_ms() { return (int)(clock() * 1000 / CLOCKS_PER_SEC); } int main(int argc, char** argv) { if (argc <…

performance x86 cpu-cache perf intel-pmu

asked Sep 29 '18 at 05:09

BeeOnRope

60,350
16
207
386

votes

4 answers

How to avoid "heap pointer spaghetti" in dynamic graphs?

The generic problem Suppose you are coding a system that consists of a graph, plus graph rewrite rules that can be activated depending on the configuration of neighboring nodes. That is, you have a dynamic graph that grows/shrinks unpredictably…

c algorithm data-structures graph cpu-cache

asked Jan 29 '16 at 18:17

MaiaVictor

51,090
44
144
286

votes

2 answers

How do Intel Xeon CPUs write to memory?

I'm trying to decide between two algorithms. One writes 8 bytes (two aligned 4-byte words) to 2 cache lines, the other writes 3 entire cache lines. If the CPU writes only the changed 8 bytes back to memory, then the first algorithm uses much less…

caching optimization x86 intel cpu-cache

asked Jul 25 '15 at 20:59

Eloff

20,828
17
83
112

votes

3 answers

Understanding CPU cache and cache line

I am trying to understand how CPU cache is operating. Lets say we have this configuration (as an example). Cache size 1024 bytes Cache line 32 bytes 1024/32 = 32 cache lines all together. Singel cache line can store 32/4 = 8 ints. 1) According to…

c cpu-cache

asked Feb 15 '11 at 17:50

kirbo

1,707
5
26
32

votes

3 answers

Why does my 8M L3 cache not provide any benefit for arrays larger than 1M?

I was inspired by this question to write a simple program to test my machine's memory bandwidth in each cache level: Why vectorizing the loop does not have performance improvement My code uses memset to write to a buffer (or buffers) over and over…

c++ c performance optimization cpu-cache

asked May 18 '15 at 22:10

hewy

votes

5 answers

How is x86 instruction cache synchronized?

I like examples, so I wrote a bit of self-modifying code in c... #include #include // linux int main(void) { unsigned char *c = mmap(NULL, 7, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE| …

c assembly instructions cpu-cache self-modifying

asked Jun 12 '12 at 01:10

Will

2,014
2
19
42

votes

3 answers

Difference between cache way and cache set

I am trying to learn some stuff about caches. Lets say I have a 4 way 32KB cache and 1GB of RAM. Each cache line is 32 bytes. So, I understand that the RAM will be split up into 256 4096KB pages, each one mapped to a cache set, which contains 4…

caching cpu-cache

asked Feb 06 '14 at 18:28

user1876942

1,411
2
20
32

votes

1 answer

Is the TLB shared between multiple cores?

I've heard that TLB is maintained by the MMU not the CPU cache. Then Does One TLB exist on the CPU and is shared between all processor or each processor has its own TLB cache? Could anyone please explain relationship between MMU and L1, L2 Cache?

caching x86 cpu-architecture cpu-cache tlb

asked Dec 23 '15 at 14:03

ruach

1,369
11
21

votes

3 answers

How can caches be defeated?

I have this question on my assignment this week, and I don't understand how the caches can be defeated, or how I can show it with an assembly program.. Can someone point me in the right direction? Show, with assembly program examples, how the two…

caching assembly cpu-cache

asked Jul 10 '11 at 15:02

John

votes

1 answer

Which cache mapping technique is used in intel core i7 processor?

I have learned about different cache mapping techniques like direct mapping and fully associative or set associative mapping, and the trade-offs between those. (Wikipedia) But I am curious which one is used in Intel core i7 or AMD processors…

x86 intel cpu-architecture cpu-cache amd-processor

asked Mar 04 '18 at 06:11

Subhadip

votes

2 answers

How does CLFLUSH work for an address that is not in cache yet?

We are trying to use the Intel CLFLUSH instruction to flush the cache content of a process in Linux at the userspace. We create a very simple C program that first access a large array and then call the CLFLUSH to flush the virtual address space of…

c linux-kernel intel cpu-architecture cpu-cache

asked Mar 09 '16 at 19:05

Mike

1,841
2
18
34

votes

7 answers

Cache-friendly copying of an array with readjustment by known index, gather, scatter

Suppose we have an array of data and another array with indexes. data = [1, 2, 3, 4, 5, 7] index = [5, 1, 4, 0, 2, 3] We want to create a new array from elements of data at position from index. Result should be [4, 2, 5, 7, 3, 1] Naive algorithm…

c algorithm performance cpu-cache

asked Jan 09 '16 at 13:10

sh1ng

2,808
4
24
38

votes

2 answers

Difference Between a Direct-Mapped Cache and Fully Associative Cache

I can't quite understand the main differences between the two caches and I was wondering if someone could help me out? I know that with a fully associative cache an address can be stored on any line in the tag array and a direct-mapped cache can…

memory memory-management cpu-cache

asked May 07 '15 at 09:56

madcrazydrumma

1,847
3
20
38

votes

2 answers

Cache bandwidth per tick for modern CPUs

What is a speed of cache accessing for modern CPUs? How many bytes can be read or written from memory every processor clock tick by Intel P4, Core2, Corei7, AMD? Please, answer with both theoretical (width of ld/sd unit with its throughput in…

performance caching cpu cpu-architecture cpu-cache

asked Mar 01 '10 at 00:51

osgx

90,338
53
357
513

Prev 1 2

…

67 68 Next