-1

When I was studying shared L2 cache in NVIDIA fermi GPU, I thought the L2 cache should be located on-chip, together with L1 cache and SMs. However, I saw some CUDA material describes L2 cache as off-chip memory. Then, I got confused on L2 cache more, because it takes more than 100 cycles to access L2 cache.

Is there any comment to understand L2 cache in NVIDIA GPU?

artless noise
  • 21,212
  • 6
  • 68
  • 105
Jie Zhang
  • 7
  • 1
  • The latency doesn't have anything to do with whether it's L2 or not. Where a cache is affects it's latency but the latency doesn't determine its category. – Jeff Hammond May 28 '15 at 02:38
  • The level number of a cache doesn't even categorize it. They are simply numbered 1, 2, 3. Each one is bigger, farther from the CPU core, and slower than the last, but there's no other constraint on the speed or size of any one level. – Potatoswatter May 28 '15 at 06:51

1 Answers1

0

A GPU consists of many streaming multiprocessors (SMs), with each SM typically having a SIMT width of 8 to 32 (Fermi series has 16 SMs with a SIMT width of 32 and AMD’s ATI 5870 Evergreen has 20 SMs with a SIMT width of 16). Each SM is associated with a private L1 Data Cache and read-only texture and constant caches along with a low latency shared memory (scratchpad memory). Every MC is associated with a slice of the shared L2 cache for faster access to the cached data.

Both MC and L2 are on-chip.

user703016
  • 37,307
  • 8
  • 87
  • 112
Roby
  • 145
  • 1
  • 12