Reinforcement Learning Batch Data Useage

Question

I am implementing a Deep Q-Learning algorithm. One thing I'm not fully getting my head around is the step where you take your batch sample from the experience queue and use that to calc the q values for the next states. This includes a secondary question about the input shape of the cnn that I'm training the policy to. My question is conceptual; do I pass the entire sampled batch into the model all at once or 1 at a time, then calculate the loss? If the entire batch that implies my CNN needs that batch size at the input layer and that when I implement the policy, I'll need to collect that number of batches before calling the inference function.

Thanks for any insight.

score 0 · Answer 1 · answered Jan 25 '23 at 02:24

0

Question regarding sampling -

The purpose of sampling from the replay buffer is to train your deep neural network. DQN is an off-policy algorithm. Therefore, even though you are following an epsilon greedy policy (usually), your agent can learn a better policy from the minibatch of sampled experiences. We randomly sample the replay buffer so that your data follows the i.i.d assumption.

Question regarding batch size -

Most DL frameworks are set up so that your network can take in a varying batch size as input or have a simple workaround for it.

answered Jan 25 '23 at 02:24

desert_ranger

1,096
3
13
26

Thank you for the response. I do understand the reason for the random sampling and how it is necessary in order to not violate the MDP requirement. My misunderstanding is more implementation-centric I guess. When I'm doing a prediction and I call model(batch_of_vectors), I need to have the input_shape of the model include a batch size or it complains. – user1452627 Jan 25 '23 at 16:17
To me that suggests, when I train this model and I'm ready to deploy it, I'll have to collect a batch of states and pass that entire batch instead of a single state. If that's the case my resource requirements will need to be much higher than if I were just capturing a single image, let's say, and passing that to the cnn policy. Sorry if my questions are super clear. – user1452627 Jan 25 '23 at 16:17
What framework are you using? – desert_ranger Jan 25 '23 at 22:08
Also, when you deploy the model, you probably won't be training it. Therefore, you'll only be using a single batch size – desert_ranger Jan 26 '23 at 02:05

Reinforcement Learning Batch Data Useage

1 Answers1