Questions tagged [cpu-cache]

A CPU-cache is a hardware structure used by the CPU to reduce the average access memory time.

Caching is beneficial once some data elements get re-used.

Caching is a general policy
aimed at eliminating latency,
already paid for next, repetitive re-accessing some already visited
but otherwise "expensive" ( _{read slow} ) resource ( _storage )

Caching does not speed-up memory access.

The maximum a professional programmer can achieve is to pay attention and excercise a due care to allow some latency-masking in a concurrent-mode of code execution, with a carefully issued instructions well beforehand before a forthcoming memory data is indeed consumed, so that the cache-management can release a LRU part and pre-fetch the requested data from slow DRAM.

How it works?

Main memory is usually built with DRAM technology, that allows for big, dense and cheap storage structures. But DRAM access is much slower than the cycle time of a modern CPU (the so called memory wall). A CPU-cache is a smaller memory, usually built with SRAM technology (expensive, but fast) that reduces the amount of accesses to main memory by storing the main memory contents that are likely to be referenced in the near future. Caches exploit a property of programs: the principle of locality, which means adjacent memory addresses are likely to be referenced close in time (spatial locality), and if an address is referenced once, it is more likely to be referenced again soon (temporal locality).

The CPU cache is tagged with an address which are extra SRAM cells. These tag cells indicate the specific address that holds the data. The CPU cache can never mirror the entire system memory so this address must be stored. The index in the array forms a set. The index and the tag can use either physical or virtual (MMU) addresses; leading to the three types PIPT, VIVT, VIPT.

Modern CPUs contain multiple levels of cache. In SMP situations a CPU cache level may be private to a single CPU, a cluster of CPUs or the whole system. Because caching can result in multiple copies of data being present in an SMP system, cache coherence protocols are used to keep data consistent. The VIVT and VIPT type caches can also result in interactions with the MMU (and its cache commonly called a TLB).

Questions regarding CPU cache inconsistencies, profiling or under-utilization are on-topic.

For more information see Wikipedia's CPU-cache article.

Also: tlb, mmu

1011 questions

-1

votes

1 answer

Other then cache what are the on chip memory? And how explicitly can be addressable?

I came to understanding about SRAM as on chip memory. Moving towards latest technologies High Band width another on-chip memory? Also I need to know what are the latest tech used in processors, in terms of on-chip memory

cpu-cache

asked Aug 04 '20 at 11:02

Dhillon's Creed

-1

votes

1 answer

Performance delta caused by pointer assignment or increment (strict aliasing?)

Update: Minimal example demonstrating the problem in Clang 7.0 - https://wandbox.org/permlink/G5NFe8ooSKg29ZuS https://godbolt.org/z/PEWiRk I'm experiencing a variation in performance of a function from 0μs to 500-900μs of a method based on 256…

performance assembly visual-c++ intel cpu-cache

asked Feb 01 '19 at 18:15

Mark Ingram

71,849
51
176
230

-1

votes

2 answers

How can we know if struct is on CPU cache or lost them to memory cache?

I'm trying to write C# game framework, So performance was critical here. Here is the reference I've found. Question is, How can we know if structs are still on the CPU-cache? If we can't, Then what scenarios pushing structs to memory-cache. For…

c# .net performance cpu-cache

asked Sep 05 '18 at 14:49

kitta

1,723
3
23
33

-1

votes

1 answer

When is it not possible to exploit spatial locality in cache?

We are given a processor whose instructions operate on 8 - byte operands and whose instructions are also encoded using 8 bytes. We are using a 16 kilo-byte, 4-way set associative cache that contains 1024 sets. The cache has 4 * 1024 = 4096 cache…

caching memory-management cpu-architecture cpu-cache

asked Jun 28 '18 at 20:54

jannè

-1

votes

1 answer

Does x86 provide instructions to load which data goes into cache?

There are prefetch instructions as mentioned here: https://c9x.me/x86/html/file_module_x86_id_252.html which allows a system program to HINT the cpu as to which data SHOULD go into the cache. But, if is a single simple program loaded by the…

assembly x86 cpu cpu-architecture cpu-cache

asked Jan 06 '18 at 09:58

Yahya

-1

votes

1 answer

if cache miss happens, the data will be moved to register directly or first moved to cache then to register?

if cache miss happens, the data will be moved to register directly from main memory, or the data firstly will be moved to cache then to register? Is there a direct way connect the register with main memory?

cpu cpu-architecture cpu-registers cpu-cache

asked Dec 18 '17 at 14:00

peasantspring

-1

votes

1 answer

hit ratio in cache - reading long sequence of bytes

Let assume that one row of cache has size 2^nB. Which hit ratio expected in the sequential reading byte by byte long contiguous memory? To my eye it is (2^n - 1) / 2^n. However, I am not sure if I am right. What do you think ?

caching cpu-architecture cpu-cache

asked Aug 11 '16 at 21:38

user6023611

-1

votes

1 answer

Where L2 cache is located? on-chip or off-chip?

When I was studying shared L2 cache in NVIDIA fermi GPU, I thought the L2 cache should be located on-chip, together with L1 cache and SMs. However, I saw some CUDA material describes L2 cache as off-chip memory. Then, I got confused on L2 cache…

gpu cpu-cache

asked May 28 '15 at 01:51

Jie Zhang

-1

votes

1 answer

Copying data in cache larger than one cache line

In C, is there any way to copy data in cache larger than one cache line size e.g. 128 or 256 bytes in a single memory read?

cpu-cache

asked Jan 07 '15 at 17:38

Nauman Ahmed

-1

votes

1 answer

Efficient method to flush cache memory in ARM assembly

I have to flush 4MB cache memory in ARM assembly language, what would we the efficient way to do it? I thought of allocating 4MB memory,writing some random data and reading back I'm implementing a tool to test Main memory. To make sure my tool…

assembly arm cpu-cache

asked Jun 30 '14 at 14:05

Nikhilendra

-1

votes

1 answer

How to bypass caches on an ARM machine

How can I bypass caches on all accessed to a certain memory location from user space on ARM? Here's an example: uint16_t* ptr = (uint16_t*) malloc(MEM_SIZE * sizeof(uint16_t)); *ptr = 0xFFFF; Can I make ptr to be uncached to avoid cache pollution? I…

c memory compiler-construction arm cpu-cache

asked Jun 03 '14 at 19:52

Tayyar R

-1

votes

1 answer

How do i calculate the size of a tag field?

I'm revising for an exam and i've came across a question that I have no idea how to do, i've looked through my notes and cant seem to find anything on it, can anyone help me? Given a 64KB cache that contains 1024 blocks with 64 bytes per block, what…

cpu-architecture cpu-cache

asked May 06 '14 at 14:40

user3557212

-1

votes

2 answers

Cache miss penalty on branching

I wonder is it faster to replace branching with 2 multiplications or no (due to cache miss penalty)? Here is my case: float dot = rib1.x*-dir.y + rib1.y*dir.x; if(dot<0){ dir.x = -dir.x; dir.y = -dir.y; } And I'm trying to replace it…

c++ performance cpu cpu-cache branch-prediction

asked Mar 22 '14 at 23:19

tower120

5,007
6
40
88

-1

votes

1 answer

memcached as hibernate L2 layer cache

I'm working on a project which uses hibernate 4 and Spring 3.2 and I'm looking for an open source L2 layer cache implementation. I know there are plenty of free products like Hazelcast (Free version) or Infinispan but it seems that they might have…

hibernate redis couchbase cpu-cache

asked Jan 03 '14 at 22:06

M2je

-2

votes

1 answer

Is there a performance disadvantage to not copying reference types to each thread in C#?

I am working on implementing threading to my C# program. Each thread requires access to the same array, but does not need to write to it, only read data. Should this array be deep copied for each thread? The reason I think this might be important…

c# multithreading performance cpu-cache

asked Dec 13 '20 at 08:34

LordQuaggan

Prev 1 2 3

…

68 Next