what is the L1 cache throughput in Nvidia's Kepler?

Question

I would like to know the throughout, latency, and the number of banks in Kepler's L1 cache (read only 'texture' and normal cache).

in a CUDA program, I'm reading the same data multiple times by different threads, I need to know if i'm bound by the L1 throughput, I couldn't find this information in any of Nvidia's documents, any help would be appreciated.

Edit: I'm using the K20 card.

[This](http://stackoverflow.com/q/19627702/2386951) plus comments on it might help you partially. — Farzad, Jan 23 '14 at 04:52
The Kepler L1 is [disabled](http://docs.nvidia.com/cuda/kepler-tuning-guide/index.html#l1-cache) for normal global reads/writes. — Robert Crovella, Jan 30 '14 at 18:08

score 2 · Answer 1 · answered Apr 10 '14 at 07:39

I myself don't know the number of banks in Kepler. But I think you don't need to care about L1 cache. As below,

L1 caching in Kepler GPUs is reserved only for local memory accesses, such as register spills and stack data. Global loads are cached in L2 only (or in the Read-Only Data Cache)

http://docs.nvidia.com/cuda/kepler-tuning-guide/

what is the L1 cache throughput in Nvidia's Kepler?

1 Answers1