DDPG Actor Update ( Pytorch Implementation Issus )

Question

This is from https://github.com/MoritzTaylor/ddpg-pytorch/blob/master/ddpg.py implementation and I guess most of the ddpg implementation are written this way.

        
self.critic_optimizer.zero_grad()
state_action_batch = self.critic(state_batch, action_batch)
value_loss = F.mse_loss(state_action_batch, expected_values.detach())
value_loss.backward()
self.critic_optimizer.step()

# Update the actor network
self.actor_optimizer.zero_grad()
policy_loss = -self.critic(state_batch, self.actor(state_batch))
policy_loss = policy_loss.mean()
policy_loss.backward()
self.actor_optimizer.step()

However after policy_loss.backwad(), I think the gradient is left in the critic network with respect to critic parameters. Shouldn't this affect the next update of critic?

If it does, what could be the solution?

score 0 · Answer 1 · answered Jul 23 '21 at 01:02

0

I figured out that

self.critic_optimizer.zero_grad()

zeros out the previous accumulated gradient.

answered Jul 23 '21 at 01:02

Dongri

1
2

DDPG Actor Update ( Pytorch Implementation Issus )

1 Answers1