I would like to know the throughout, latency, and the number of banks in Kepler's L1 cache (read only 'texture' and normal cache).
in a CUDA program, I'm reading the same data multiple times by different threads, I need to know if i'm bound by the L1 throughput, I couldn't find this information in any of Nvidia's documents, any help would be appreciated.
Edit: I'm using the K20 card.