Questions tagged [policy-gradient-descent]

44 questions
0
votes
1 answer

DDPG Actor Update ( Pytorch Implementation Issus )

This is from https://github.com/MoritzTaylor/ddpg-pytorch/blob/master/ddpg.py implementation and I guess most of the ddpg implementation are written this way. self.critic_optimizer.zero_grad() state_action_batch = self.critic(state_batch,…
0
votes
1 answer

ValueError: No gradients provided for any variable in policy gradient

I have been trying to implement policy gradient algorithm in reinforcement learning. However, I am facing the error"ValueError: No gradients provided for any variable:" while computing the gradients for the custom loss function as shown below: def…
0
votes
1 answer

MlpPolicy only return 1 and -1 with action spece[-1,1]

I try to use Stable Baseliens train a PPO2 with MlpPolicy. After 100k timesteps, I can only get 1 and -1 in action. I restrict action space to [-1, 1] and directly use action as control. I don't know if it is because I directly use action as…
0
votes
1 answer

PPO2 reinforcement learning 'catastrophic forgetting'?

I'm implementing PPO2 reinforcement learning on my self-build tasks and always encounter such situations where the agent seems to be nearly matured then suddenly catstrophically loses its performance and couldn't hold its stable performance. I don't…
0
votes
1 answer

Reward not increasing while training a Bipedal System

I am completely new to reinforcement learning and this is my first program in practice. I am trying to train the bipedal system in the OpenAI gym environment using the policy gradient algorithm. However, the reward never changes, either at episode 0…
0
votes
0 answers

Gradient calculation for actor in DDPG algorithm

I am experiencing some issues in computing the actor update in DDPG algorithm using Tensorflow 2. The following is the code for both critic and actor updates: with tf.GradientTape() as tape: #persistent=True # compute current action values …
0
votes
1 answer

PPO algorithm converges on only one action

I have taken some reference implementations of PPO algorithm and am trying to create an agent which can play space invaders . Unfortunately from the 2nd trial onwards (after training the actor and critic N Networks for the first time) , the…
0
votes
1 answer

Policy gradient (REINFORCE) diverging when finding the shortest path in a graph with negative rewards

I want to use the policy gradient to find the shortest path among a group of nodes in a network. The network is represented using a graph with edges labeled with value -1. Now, a path with a negative value closest to 0 is the shortest…
0
votes
1 answer

Loss Policy Gradient - Reinforcement Learning

I am training my network using policy gradient and defining the loss as: self.loss = -tf.reduce_mean(tf.log(OUTPUT_NN)* self.REWARDS)) self.opt = tf.train.AdamOptimizer(self.lr).minimize(self.loss) What I do not understand is that the loss…
0
votes
1 answer

How do we assess each reward in the return in Policy Gradient Methods?

Hi StackOverflow Community, I have a problem with the policy gradient methods in reinforcement learning. In policy gradient methods, we increase/decrease the log probability of an action based on the return (i.e. total rewards) from that step…
test
  • 93
  • 1
  • 11
0
votes
1 answer

Trying to implement experience replay in Tensorflow

I am trying to implement experience replay in Tensorflow. The problem I am having is in storing outputs for the models trial and then updating the gradient simultaneously. A couple approaches I have tried are to store the resulting values from…
Taylor_K
  • 1
  • 1
-1
votes
2 answers

How do you evaluate a trained reinforcement learning agent whether it is trained or not?

I am new to reinforcement learning agent training. I have read about PPO algorithm and used stable baselines library to train an agent using PPO. So my question here is how do I evaluate a trained RL agent. Consider for a regression or…
-1
votes
1 answer

in stock trading how to masure quantity of stock

I am working on stock market analysis and prediction using machine learning methods, especially with reinforcement learning. I am trying to predict short, long and flat. (buy, hold, sell) . (any suggestion or material is appreciated), currently, I…
-1
votes
1 answer

Multiclass Sigmoid for DRL action picking

I am working on Deep reinforcement learning problem and I would like to use Sigmoid for my last layer instead of softmax. I am stuck on the what to use for action picking. Specifically, How should I replace the last two line of this code and with…
1 2
3