2

Summary
I'd like some clarification on how the thrust::device_vector works.

AFAIK, writing to an indexed location such as device_vector[i] = 7 is implemented by the host, and therefore causes a call to memcpy. Does device_vector.push_back(7) also call memcpy?

Background

I'm working on a project comparing stock prices. The prices are stored in two vectors. I iterate over the two vectors, and when there's a change in their prices relative to each other, I write that change into a new vector. So I never know how long the resulting vector is going to be. On the CPU the natural way to do this is with push_back, but I don't want to use push_back on the GPU vector if its going to call memcpy every time.

Is there a more efficient way to build a vector piece by piece on the GPU?

Research
I've looked at this question, but it (and others) are focused on the most efficient way to access elements from the host. I want to build up a vector on the GPU.

Thank you.

talonmies
  • 70,661
  • 34
  • 192
  • 269
John Mansell
  • 624
  • 5
  • 16

1 Answers1

1

Does device_vector.push_back(7) also call memcpy?

No. It does, however, result in a kernel launch per call.

Is there a more efficient way to build a vector piece by piece on the GPU?

Yes.

Build it (or large segments of it) in host memory first, then copy or insert to memory on the device in a single operation. You will greatly reduce latency and increase PCI-e bus utilization by doing so.

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • Thanks. That's very helpful. One further question though. I'm trying to analyze lots of pairs of vectors at the same time on the GPU, and each pair will produce its own result vector. Its this result vector I'm trying to build piece by piece. So the whole point is to try and parallelize the process. Is there a way to build all the results vectors on the gpu? Building them on the the host would defeat the purpose. Thanks. – John Mansell Jun 15 '18 at 07:27
  • Hmm, in that case: is it then inefficient to copy the vector from host to device using an empty device vector (called with `.reserve(N);`): `thrust::copy_n(std::cbegin(hostVector), N, std::back_inserter(deviceVector));`? – JHBonarius Jun 17 '18 at 15:31