How can I get data from the buffer efficiently onto the GPU in PyTorch?

Question

I have a ray actor which collects experiences (the buffer) and a ray actor that optimizes over them (the learner) + several actors that only collect experiences. This is similar to the Ape-X reinforcement learning algorithm.

My main problem is that using the learner to sample from the buffer takes a lot of time, because data can only be transferred from the buffer to the learner in cpu format (even if the learner and buffer are on the same machine). Consequently, in order to run an optimization pass on the learner, I will still need to push the data to the GPU, after every time I call ray.get(buffer.GetSamples.remote()). This is very inefficient and takes a lot of time away from optimization calculations.

In an ideal world, the buffer would continuously push random samples to the GPU, and the learner could simply pick a chunk from those at each pass. How can I make this work?

Also, putting both the learner and the buffer into one ray actor doesn't work, because ray (and obv python) seems to have significant problems with multi-threading, and doing it in a serial fashion defeats the purpose (as it will be even slower).

Note that this is a follow-up to another question from me here.

EDIT: I should note that this is for PyTorch

score 2 · Accepted Answer · answered Jun 23 '19 at 07:22

2

You can call .cuda() and push the loaded samples onto a python Queue, and in another thread consume those GPU samples from the queue.

This is how the Ape-X implementation in Ray manages concurrent data loading for TensorFlow.

answered Jun 23 '19 at 07:22

Eric

101
1

How can I get data from the buffer efficiently onto the GPU in PyTorch?

1 Answers1