I refer you to following page http://code.google.com/p/thrust/wiki/QuickStartGuide#Vectors. Please see second paragraph where it says that
Also note that individual elements of a device_vector can be accessed using the standard bracket notation. However, because each of these accesses requires a call to cudaMemcpy, they should be used sparingly. We'll look at some more efficient techniques later.
I searched all over the document but I could not find the more efficient technique. Does anyone know the fastest way to do this? i.e how to access device vector/device pointer on host fastest?