In NVIDIA Fermi and Kepler GPUs (probably Maxwell too), an L1 cache line is 128-bytes long, while an L2 cache line is 32-byte long. Shouldn't that be the other way around? I mean, L1 is much smaller, shouldn't it try to cache shorter segments of memory to prevent thrashing?
Asked
Active
Viewed 3,033 times
5

artless noise
- 21,212
- 6
- 68
- 105

einpoklum
- 118,144
- 57
- 340
- 684
-
Memory transactions are nominally warp wise, and warps contain 32 threads. A 128 byte L1 cache line corresponds to each thread in a warp reading a 32 bit word (the standard transaction size). Makes sense to me..... – talonmies May 25 '15 at 11:17