I've a question about the CUDA Fermi's Architecture: I've read somewhere that in Fermi's architecture the global memory's access is fast like the shared memory just because now they use uniform addressing.
So it's true that I can access to data on the global memory with no (big) latency (unlike the "pre-Fermi" GPU)?
It's very important for me to know that just because I'm programming code for an Nvidia Tesla GPU without have it (it's in the University's lab, and I can't access it during the summer...)