How to combine A2C with BPTT?

Asked Feb 11 '23 at 15:56

Active Feb 11 '23 at 15:56

Viewed 21 times

I'm having a little difficulty understanding how I can apply backpropagation through time to the A2C method, or any reinforcement learning method for that matter.

As I understand it, BPTT conceptually unrolls a recurrent network and performs a forward pass, then takes the output from this pass, calculates a loss and uses this to backpropagate through the network, taking into account the previous states of the network. However, I'm slightly unsure how I would go about combining this with A2C. Should I calculate the final actor and critic losses from an epoch and use these to backpropagate, or should I accumulate the total losses at each step and do the same, or have I misunderstood entirely and need to do something else?

Thanks in advance for any advice.

asked Feb 11 '23 at 15:56

Telf

How to combine A2C with BPTT?

0 Answers0