ValueError: No gradients provided for any variable in policy gradient

Question

I have been trying to implement policy gradient algorithm in reinforcement learning. However, I am facing the error"ValueError: No gradients provided for any variable:" while computing the gradients for the custom loss function as shown below:

def loss_function(prob, action, reward):

    prob_action = np.array([prob.numpy()[0][action]]) #prob is like ->[0.4900, 0.5200] and action is scalar index->1,0
    log_prob = tf.math.log(prob_action)
    loss = tf.multiply(log_prob, (-reward))
    return loss

I am computing the gradients as below:

def update_policy(policy, states, actions, discounted_rewards):
    opt = tf.keras.optimizers.SGD(learning_rate=0.1)

    for state, reward, action in zip(states, discounted_rewards, actions):
        with tf.GradientTape() as tape:
            prob = policy(state, training=True)
            loss = loss_function(prob, action, reward)
            print(loss)

        gradients = tape.gradient(loss, policy.trainable_variables)
        opt.apply_gradients(zip(gradients, policy.trainable_variables))

Kindly please help me out in this issue. Thank you

I think this error means that the loss tensor is not differentiable, so tensorflow cannot calculate the gradients of the loss against _trainable_variables_. Something in ```loss_function``` is disrupting the path from _trainable_variables_ to the loss. I am unsure what is causing this issue, it may be ```prob_action = np.array([prob.numpy()[0][action]])```; try keeping _prob_action_ as a ```tf.Tensor``` instead of a numpy array. — gekrone, Jun 01 '21 at 16:12

score 0 · Answer 1 · answered Jun 03 '21 at 00:10

As @gekrone indicates in the comment this is definetly due to the gradients not flowing due to prob_action being a numpy array and not a tensor. Also be careful not to use the .numpy() method. Probably stick to something like

prob_action = prob[0][action]
...

and this should work.

ValueError: No gradients provided for any variable in policy gradient

1 Answers1