Questions tagged [policy-gradient-descent]
44 questions
0
votes
1 answer
DDPG Actor Update ( Pytorch Implementation Issus )
This is from https://github.com/MoritzTaylor/ddpg-pytorch/blob/master/ddpg.py implementation and I guess most of the ddpg implementation are written this way.
self.critic_optimizer.zero_grad()
state_action_batch = self.critic(state_batch,…

Dongri
- 1
- 2
0
votes
1 answer
ValueError: No gradients provided for any variable in policy gradient
I have been trying to implement policy gradient algorithm in reinforcement learning. However, I am facing the error"ValueError: No gradients provided for any variable:" while computing the gradients for the custom loss function as shown below:
def…
0
votes
1 answer
MlpPolicy only return 1 and -1 with action spece[-1,1]
I try to use Stable Baseliens train a PPO2 with MlpPolicy. After 100k timesteps, I can only get 1 and -1 in action.
I restrict action space to [-1, 1] and directly use action as control.
I don't know if it is because I directly use action as…

qwererer2
- 11
0
votes
1 answer
PPO2 reinforcement learning 'catastrophic forgetting'?
I'm implementing PPO2 reinforcement learning on my self-build tasks and always encounter such situations where the agent seems to be nearly matured then suddenly catstrophically loses its performance and couldn't hold its stable performance. I don't…
0
votes
1 answer
Reward not increasing while training a Bipedal System
I am completely new to reinforcement learning and this is my first program in practice. I am trying to train the bipedal system in the OpenAI gym environment using the policy gradient algorithm.
However, the reward never changes, either at episode 0…

Atharva Dubey
- 832
- 1
- 8
- 25
0
votes
0 answers
Gradient calculation for actor in DDPG algorithm
I am experiencing some issues in computing the actor update in DDPG algorithm using Tensorflow 2. The following is the code for both critic and actor updates:
with tf.GradientTape() as tape: #persistent=True
# compute current action values
…

AleB
- 153
- 1
- 3
- 10
0
votes
1 answer
PPO algorithm converges on only one action
I have taken some reference implementations of PPO algorithm and am trying to create an agent which can play space invaders . Unfortunately from the 2nd trial onwards (after training the actor and critic N Networks for the first time) , the…

JAYDEEP GHOSE
- 11
- 1
0
votes
1 answer
Policy gradient (REINFORCE) diverging when finding the shortest path in a graph with negative rewards
I want to use the policy gradient to find the shortest path among a group of nodes in a network.
The network is represented using a graph with edges labeled with value -1.
Now, a path with a negative value closest to 0 is the shortest…

abhi
- 397
- 3
- 14
0
votes
1 answer
Loss Policy Gradient - Reinforcement Learning
I am training my network using policy gradient and defining the loss as:
self.loss = -tf.reduce_mean(tf.log(OUTPUT_NN)* self.REWARDS))
self.opt = tf.train.AdamOptimizer(self.lr).minimize(self.loss)
What I do not understand is that the loss…

Alex Gomes
- 29
- 1
- 5
0
votes
1 answer
How do we assess each reward in the return in Policy Gradient Methods?
Hi StackOverflow Community,
I have a problem with the policy gradient methods in reinforcement learning.
In policy gradient methods, we increase/decrease the log probability of an action based on the return (i.e. total rewards) from that step…

test
- 93
- 1
- 11
0
votes
1 answer
Trying to implement experience replay in Tensorflow
I am trying to implement experience replay in Tensorflow. The problem I am having is in storing outputs for the models trial and then updating the gradient simultaneously. A couple approaches I have tried are to store the resulting values from…

Taylor_K
- 1
- 1
-1
votes
2 answers
How do you evaluate a trained reinforcement learning agent whether it is trained or not?
I am new to reinforcement learning agent training. I have read about PPO algorithm and used stable baselines library to train an agent using PPO. So my question here is how do I evaluate a trained RL agent. Consider for a regression or…

chink
- 1,505
- 3
- 28
- 70
-1
votes
1 answer
in stock trading how to masure quantity of stock
I am working on stock market analysis and prediction using machine learning methods, especially with reinforcement learning. I am trying to predict short, long and flat. (buy, hold, sell) . (any suggestion or material is appreciated),
currently, I…

parth vadhadiya
- 119
- 8
-1
votes
1 answer
Multiclass Sigmoid for DRL action picking
I am working on Deep reinforcement learning problem and I would like to use Sigmoid for my last layer instead of softmax. I am stuck on the what to use for action picking.
Specifically, How should I replace the last two line of this code and with…

ahmet hamza emra
- 580
- 4
- 15