0

I've a question about the CUDA Fermi's Architecture: I've read somewhere that in Fermi's architecture the global memory's access is fast like the shared memory just because now they use uniform addressing.

So it's true that I can access to data on the global memory with no (big) latency (unlike the "pre-Fermi" GPU)?

It's very important for me to know that just because I'm programming code for an Nvidia Tesla GPU without have it (it's in the University's lab, and I can't access it during the summer...)

Bart
  • 19,692
  • 7
  • 68
  • 77
Andrea Sylar Solla
  • 157
  • 1
  • 2
  • 10
  • No, the latency to access global memory on Fermi GPU is still much larger than to access shared or register memory. On Fermi, however, there is two-level cache, which could speed things up. – aland Aug 11 '12 at 18:11

1 Answers1

1

This is not true. Global memory access on Fermi is relatively long when compared to shared memory access. However, due to caches, you may directly hit a cach reducing the latency. This is particularly useful in less-than-ideal memory access patterns (e.g. slightly misaligned access).

Uniform memory addressing is a completely different thing, unrelated to the above. Uniform memory addressing allows the GPU to deduct at runtime if given memory pointer is refering to global or shared (or even mapped-pinned-host, or other-GPU) memory. On pre-Fermi cards the type of memory had to be deducible at compile time.

CygnusX1
  • 20,968
  • 5
  • 65
  • 109