Questions tagged [cpu-cache]

A CPU-cache is a hardware structure used by the CPU to reduce the average access memory time.

Caching is beneficial once some data elements get re-used.


Caching is a general policy
aimed at eliminating latency,
already paid for next, repetitive re-accessing some already visited
but otherwise "expensive" ( read slow ) resource ( storage )


Caching does not speed-up memory access.

The maximum a professional programmer can achieve is to pay attention and excercise a due care to allow some latency-masking in a concurrent-mode of code execution, with a carefully issued instructions well beforehand before a forthcoming memory data is indeed consumed, so that the cache-management can release a LRU part and pre-fetch the requested data from slow DRAM.


How it works?

Main memory is usually built with DRAM technology, that allows for big, dense and cheap storage structures. But DRAM access is much slower than the cycle time of a modern CPU (the so called memory wall). A CPU-cache is a smaller memory, usually built with SRAM technology (expensive, but fast) that reduces the amount of accesses to main memory by storing the main memory contents that are likely to be referenced in the near future. Caches exploit a property of programs: the principle of locality, which means adjacent memory addresses are likely to be referenced close in time (spatial locality), and if an address is referenced once, it is more likely to be referenced again soon (temporal locality). Latency pipeline for memory, disk, network, etc

The CPU cache is tagged with an address which are extra SRAM cells. These tag cells indicate the specific address that holds the data. The CPU cache can never mirror the entire system memory so this address must be stored. The index in the array forms a set. The index and the tag can use either physical or virtual (MMU) addresses; leading to the three types PIPT, VIVT, VIPT.

Modern CPUs contain multiple levels of cache. In SMP situations a CPU cache level may be private to a single CPU, a cluster of CPUs or the whole system. Because caching can result in multiple copies of data being present in an SMP system, cache coherence protocols are used to keep data consistent. The VIVT and VIPT type caches can also result in interactions with the MMU (and its cache commonly called a TLB).

Questions regarding CPU cache inconsistencies, profiling or under-utilization are on-topic.

For more information see Wikipedia's CPU-cache article.

Also: ,

1011 questions
0
votes
1 answer

Manually flushing a write-through cache

Does it make any sense in flushing the CPU cache manually, if it is implemented as a write-through cache?
San
  • 71
  • 1
  • 6
0
votes
1 answer

Infinispan L2 cache custom eviction policy

I am planning to use infinispan as my Hibernate app L2 cache. My all entities has a life cycle attribute [ New -> Run -> Completed ]. Initially my entities are in New state and when time goes it's moves to other states. In simply I want to evict…
era
  • 391
  • 4
  • 24
0
votes
3 answers

TLB physical addressing doesn't make sense to me

I'm reading, in a simple way, how do TLBs work and I don't understand something: The TLB references physical memory addresses in its table. It may reside between the CPU and the CPU cache, between the CPU cache and primary storage memory, or…
Johnny Pauling
  • 12,701
  • 18
  • 65
  • 108
0
votes
1 answer

cache coherency : snooping v directory based

from what I understand: directory based system is more server centric design and snooping is more peer to peer centric. That is why directory based requires less messages for any read-miss as it can reach the processor who has the valid data after…
user494461
0
votes
1 answer

measure cycles spent in accessing remote cache

How to measure cycles spent in accessing shared remote cache say L3. I need to get this cache access information both system-wide and for per-thread. Is there any specific tool/hardware requirements. Or can I use any formula to get an approximate…
0
votes
0 answers

Cache-concious design of Master-Worker processes

I recently started working on a server application designed with the familiar Master-Worker pattern with threads, where one privileged thread manages several worker threads. I have now realized how troublesome threads truly are. I am now considering…
haste
  • 1,441
  • 1
  • 10
  • 21
0
votes
1 answer

What algorithm is used to determine if the data is cacheble in an ARM Cortex-M0 (shown by the HPROT[3] signal bit)

as mentioned above, ARM Cortex-M0's HPROT[3] signal tell you if the data on the bus is cacheble or not. How is it decided by the MC?
Gaurav Suman
  • 515
  • 1
  • 3
  • 17
0
votes
2 answers

How to check if an object is in the CPU cache?

Is there a way in java to check if a specific object is in the CPU cache? Is there a way to test if reading/writing one of its fields will make a cache miss? I wrote java programs in the past, but not complex ones, and now I have to do some academic…
Oren
  • 2,767
  • 3
  • 25
  • 37
0
votes
2 answers

How far should one trust hardware counter profiling using VsPerfCmd.exe?

I'm attempting to use VsPerfCmd.exe to profile branch misprediction and last level cache misses in an instrumented native application. The setup works as it says on the tin, but the results I'm getting don't seem sensible. For instance, a function…
Koarl
  • 246
  • 1
  • 2
  • 10
-1
votes
1 answer

Intel documentation, atomic access description doesn't make sense

I want to know the meaning of this sentence, parsing. Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line I can't actually understand it, I'd like to get some simple C code for some explanation, thank you very…
归故里
  • 9
  • 1
-1
votes
1 answer

Should I align data to their data type or cpu cache line size?

Data is usually aligned with its own data type, i.e a 32-bit int is usually aligned to 4 bytes, this makes loading/storing them more efficient for the processor. Now when does cache line alignment come into play? If x64 cache line size is 64 bytes…
Dan
  • 2,694
  • 1
  • 6
  • 19
-1
votes
1 answer

How long does it take to fill a cache line?

Assuming a cache line is 64 bytes, 100 nanoseconds is the often quoted figure for main memory access, is this figure for 1 byte at a time or for 64 bytes at a time?
Samuel Squire
  • 127
  • 3
  • 13
-1
votes
1 answer

How fast (or slow) is a LDR cache HIT compared to other ARM instructions

The newer ARM Architecture Reference Manuals don't give instruction timings any more. (Instruction timings were given, at least for the early ARM2 and ARM3 chips). I know that cache misses result in external memory accesses that are very slow,…
colinh
  • 29
  • 5
-1
votes
1 answer

IA32 Assembly: splitting address into its components

I'm having trouble splitting a stored address into its components (namely into the tag bits, set index bits, and block offset bits). I'm trying to implement the function... unsigned char check_cache(line cache[4], unsigned char addr); This function…
-1
votes
2 answers

Cache Locality - weight of TLB, Cache Lines, and ...?

From my understanding the constructs which give rise to the high level concept of "cache locality" are the following: Translation Lookaside Buffer (TLB) for virtual memory translation. Accessing the same virtual memory within the 4096 byte…