Questions tagged [cpu-cache]

A CPU-cache is a hardware structure used by the CPU to reduce the average access memory time.

Caching is beneficial once some data elements get re-used.


Caching is a general policy
aimed at eliminating latency,
already paid for next, repetitive re-accessing some already visited
but otherwise "expensive" ( read slow ) resource ( storage )


Caching does not speed-up memory access.

The maximum a professional programmer can achieve is to pay attention and excercise a due care to allow some latency-masking in a concurrent-mode of code execution, with a carefully issued instructions well beforehand before a forthcoming memory data is indeed consumed, so that the cache-management can release a LRU part and pre-fetch the requested data from slow DRAM.


How it works?

Main memory is usually built with DRAM technology, that allows for big, dense and cheap storage structures. But DRAM access is much slower than the cycle time of a modern CPU (the so called memory wall). A CPU-cache is a smaller memory, usually built with SRAM technology (expensive, but fast) that reduces the amount of accesses to main memory by storing the main memory contents that are likely to be referenced in the near future. Caches exploit a property of programs: the principle of locality, which means adjacent memory addresses are likely to be referenced close in time (spatial locality), and if an address is referenced once, it is more likely to be referenced again soon (temporal locality). Latency pipeline for memory, disk, network, etc

The CPU cache is tagged with an address which are extra SRAM cells. These tag cells indicate the specific address that holds the data. The CPU cache can never mirror the entire system memory so this address must be stored. The index in the array forms a set. The index and the tag can use either physical or virtual (MMU) addresses; leading to the three types PIPT, VIVT, VIPT.

Modern CPUs contain multiple levels of cache. In SMP situations a CPU cache level may be private to a single CPU, a cluster of CPUs or the whole system. Because caching can result in multiple copies of data being present in an SMP system, cache coherence protocols are used to keep data consistent. The VIVT and VIPT type caches can also result in interactions with the MMU (and its cache commonly called a TLB).

Questions regarding CPU cache inconsistencies, profiling or under-utilization are on-topic.

For more information see Wikipedia's CPU-cache article.

Also: ,

1011 questions
0
votes
0 answers

PreLoad Engine (PLE) ARM A9 MPCore

I'm trying to pre-load load SDRAM memory to L2 cache. I have initialised the MMU and made 1 translation table. I also enabled the cache and I see the software is using the cache as well... To load some SDRAM to my L2 cache i tried to work with the…
0
votes
1 answer

SMP boot of ARM Cortex A9 sequence with MMU/cache enabled

I am trying to do SMP boot in U-boot on Dual core ARM Cortex A9 system with MMU/Cache enabled. I needed the sequence of initializations. How should be the sequence of the following things happen. In what order? MMU page table setup Set SMP bit…
prasanna
  • 51
  • 5
0
votes
0 answers

What causes the retired instructions to increase?

I have a 496*O(N^3) loop. I am performing a blocking optimization technique where I'm operating 2 images at a time instead of 1. In raw terms, I am unrolling the outer loop. (The non-unrolled version of the code is as shown below: ) b.t.w I'm using…
0
votes
2 answers

Difference between use of while() and sleep() to put program into sleep mode

I have created a shared object and access it from two different program and measuring the time. DATA array is the shared object between two processes. Case 1: Use of while inside program1 program1 : access shared DATA array ;// to load into memory…
bholanath
  • 1,699
  • 1
  • 22
  • 40
0
votes
1 answer

Unexpected output in C with access to ARRAY in memory with RDTSC

Here is my program in C. #include #include #include #include static int DATA[1024]={1,2,3,4,.....1024}; inline void foo_0(void) { int j; puts("Hello, I'm inside foo_0"); int k=0; …
bholanath
  • 1,699
  • 1
  • 22
  • 40
0
votes
0 answers

How to flush out the Shared function data from CPU cache

I am creating a shared data for two processes and then after reading data from CPU cache, I want to flush out the shared function data from CPU cache. I am able to find the starting address of that particular shared data in cache memory but unable…
Amit_T
  • 149
  • 11
0
votes
1 answer

Calculating actual/effective CPI for 3 level cache

(a) You are given a memory system that has two levels of cache (L1 and L2). Following are the specifications: Hit time of L1 cache: 2 clock cycles Hit rate of L1 cache: 92% Miss penalty to L2 cache (hit time of L2): 8 clock cycles Hit rate of L2…
User14229754
  • 85
  • 2
  • 12
0
votes
1 answer

ARM bare-metal with MMU: write to non-cachable,non-bufferable mapped area fail

I am ARM Cortex A9 CPU with 2 cores. But I just use 1 core and the other is just in a busy loop. I setup the MMU table using section (1MB per entry) like this: 0x00000000-0x14ffffff => 0x00000000-0x14ffffff (non-cachable,…
sing lam
  • 131
  • 1
  • 10
0
votes
0 answers

Finding cache cpi time

I need a formula or to at least be pointed in the right direction it involves cache and cpi time. I have a base machine that has a 2.4ghz clock rate it has L1 and L2 cache. L1 is 256k direct mapped write through . 90% read without a hit rate without…
0
votes
2 answers

Which one will workload(usage) of the CPU-Core if there is a persistent cache-miss, will be 100%?

That is, if the core processor most of the time waiting for data from RAM or cache-L3 with cache-miss, but the system is a real-time (real-time thread priority), and the thread is attached (affinity) to the core and works without switching…
Alex
  • 12,578
  • 15
  • 99
  • 195
0
votes
2 answers

Memory performance/cache puzzle

I have a memory performance puzzle. I'm trying to benchmark how long it takes to fetch a byte from main memory, and how various BIOS settings and memory hardware parameters influence it. I wrote the following code for Windows that, in a loop,…
Andrew
  • 867
  • 7
  • 20
0
votes
0 answers

DIfference between eviction due to clflush and eviction due to access to same set by other process

As per my understanding, when we use clflush(&Array1[i]), then we actually manually evict the cache line where this Array1[i] resides and it is guaranteed that the element ,Array1[i] will not present in cache and next time after clflush when we try…
bholanath
  • 1,699
  • 1
  • 22
  • 40
0
votes
1 answer

What are exactly memory read write operations of the prossesor

Im sure my title is not perfect so let me clear my self. by this article : http://msdn.microsoft.com/en-us/magazine/jj863136.aspx , void Print() { int d = _data; // Read 1 if (_initialized) // Read 2 Console.WriteLine(d); else …
Stav Alfi
  • 13,139
  • 23
  • 99
  • 171
0
votes
1 answer

How can i check my CPU cache in Windows 8?

i have a problem: i can not find any panel or command in windows 8 which can show me my CPU cache? there is some softwares can get sysconfig. but those are not full info. it's completely all information except CPU_CACHE.
Amin AmiriDarban
  • 2,031
  • 4
  • 24
  • 32
0
votes
0 answers

Hibernate / Spring transaction issue with Infinispan L2 cache

I am trying to use Infinispan as Hibernate L2 cache for an application which use technologies like Tomcat 6, Hibernate 4 and Spring 3.5. The application running in Tomcat and our current transaction manager is …