0

I am experiencing some issues in computing the actor update in DDPG algorithm using Tensorflow 2. The following is the code for both critic and actor updates:

with tf.GradientTape() as tape: #persistent=True
   
   # compute current action values
   current_q = self.Q_model(stat_act.astype('float32'))
   
   # compute target action values
   action_next = TargetNet.p_model(states_next.astype('float32'))
   stat_act_next = np.concatenate((states_next,action_next),axis=1)
   target_q = TargetNet.Q_model(stat_act_next.astype('float32'))
   
   target_values = rewards+self.gamma*target_q
   

   loss_q = self.loss(y_true=target_values,y_pred=current_q)
               
             
variables_q = self.Q_model.trainable_variables
gradients_q = tape.gradient(loss_q, variables_q)
self.optimizer.apply_gradients(zip(gradients_q, variables_q))

with tf.GradientTape() as tape:
   current_actions = self.p_model(states.astype('float32'))
   current_q_pg = self.Q_model(np.concatenate((states.astype('float32'),
                                                             current_actions), 
                                                             axis=1))
   loss_p = - tf.math.reduce_mean(current_q_pg)

variables_p = self.p_model.trainable_variables
gradients_p = tape.gradient(loss_p, variables_p)
self.optimizer.apply_gradients(zip(gradients_p, variables_p))

Those updates are part of a class method and actor and critic network are specified separately. The issue is that gradient_p is returned as a list of None. I don't know what is wrong in this piece of code. I am completely aware that I could split the computation of policy gradients according to the chain rule, but I don't know how to compute the derivate of the critic values with respect to the action input using tf.GradientTape. How can I correctly implement this part? I don't understand why tf.GradientTape is not able to go back to the trainable variables of the actor network and perform the computation in just one pass.

AleB
  • 153
  • 1
  • 3
  • 10
  • 1
    I have exact same problem, did you find a solution or reason why the gradients are none? – corvo Aug 21 '20 at 14:22
  • 1
    Apparently, I solved the problem just by looking at the variables involved in the computation and checked if they were properly initialized as Tensor. There was some variable which was a numpy array that bring the problem. I haven't found any similar experiences, but just be sure that all your variables used inside the `tf.GradientTape` are `tf.Tensor`. – AleB Aug 27 '20 at 08:50

0 Answers0