0

This is from https://github.com/MoritzTaylor/ddpg-pytorch/blob/master/ddpg.py implementation and I guess most of the ddpg implementation are written this way.

        
self.critic_optimizer.zero_grad()
state_action_batch = self.critic(state_batch, action_batch)
value_loss = F.mse_loss(state_action_batch, expected_values.detach())
value_loss.backward()
self.critic_optimizer.step()

# Update the actor network
self.actor_optimizer.zero_grad()
policy_loss = -self.critic(state_batch, self.actor(state_batch))
policy_loss = policy_loss.mean()
policy_loss.backward()
self.actor_optimizer.step()

However after policy_loss.backwad(), I think the gradient is left in the critic network with respect to critic parameters. Shouldn't this affect the next update of critic?

If it does, what could be the solution?

Dongri
  • 1
  • 2

1 Answers1

0

I figured out that

self.critic_optimizer.zero_grad()

zeros out the previous accumulated gradient.

Dongri
  • 1
  • 2