I am implementing a Deep Q-Learning algorithm. One thing I'm not fully getting my head around is the step where you take your batch sample from the experience queue and use that to calc the q values for the next states. This includes a secondary question about the input shape of the cnn that I'm training the policy to. My question is conceptual; do I pass the entire sampled batch into the model all at once or 1 at a time, then calculate the loss? If the entire batch that implies my CNN needs that batch size at the input layer and that when I implement the policy, I'll need to collect that number of batches before calling the inference function.
Thanks for any insight.