0

Part of Dcache (L1) can be used as local memory on the cavium octeon architecture (base on the mips64). I want to know which is faster;reading a value from this memory (Dcache) or from normal global variable (.data)?

0decimal0
  • 3,884
  • 2
  • 24
  • 39
ShadowStar
  • 45
  • 1
  • 8

1 Answers1

0

The reason why CPU manufactures complicate the design and cost by adding different caches to CPU is to decrease memory reading latency. It is much faster to access data from L1 cache than from RAM. So the answer is that reading value from L1 data cache is much faster (I do not have exact figures and it depends on the type and latency properties of the memory and speed of the CPU, but roughly we are talking of differences in the range of maybe less than 10 clock cycles for L1 versus over 100 clock cycles for accessing data from DRAM (cache miss) - very rough numbers indeed).

FooF
  • 4,323
  • 2
  • 31
  • 47
  • Thank you, understand. And the global variable is likely in Dcache, right? – ShadowStar Jul 31 '13 at 06:22
  • Yes, likely. Most of the memory accesses end up going through the cache; it is the occasional cache misses that cost so much (for example the first time you read a value from memory, before the memory range is not read to the cache memory). It seems by your description the Cavium Octeon provides optimization measure to make sure there are no cache misses since you read the "local" variables always directly from L1 data cache, allowing the CPU to work in full speed, not waiting *so long* for memory read access. – FooF Jul 31 '13 at 06:49
  • Arranging the order you read/write the memory (as linearly as possible, that is as predictable for the caching system as possible) you can to a degree optimize the cache misses out. Then there is not much differences, except for context switches forcing to fill up the cache memory again. You could check if there are tools to analyse this (maybe from your CPU vendor support). – FooF Jul 31 '13 at 06:54