Highest Voted 'policy-gradient-descent' Questions

0

votes

1 answer

DDPG Actor Update ( Pytorch Implementation Issus )

This is from https://github.com/MoritzTaylor/ddpg-pytorch/blob/master/ddpg.py implementation and I guess most of the ddpg implementation are written this way. self.critic_optimizer.zero_grad() state_action_batch = self.critic(state_batch,…

asked Jul 23 '21 at 00:27

Dongri

1
2

0

votes

1 answer

ValueError: No gradients provided for any variable in policy gradient

I have been trying to implement policy gradient algorithm in reinforcement learning. However, I am facing the error"ValueError: No gradients provided for any variable:" while computing the gradients for the custom loss function as shown below: def…

python tensorflow reinforcement-learning gradient-descent policy-gradient-descent

asked May 31 '21 at 11:33

Heisenberg White

1

0

votes

1 answer

MlpPolicy only return 1 and -1 with action spece[-1,1]

I try to use Stable Baseliens train a PPO2 with MlpPolicy. After 100k timesteps, I can only get 1 and -1 in action. I restrict action space to [-1, 1] and directly use action as control. I don't know if it is because I directly use action as…

reinforcement-learning openai-gym policy-gradient-descent stable-baselines mujoco

asked Nov 22 '20 at 14:14

qwererer2

11

0

votes

1 answer

PPO2 reinforcement learning 'catastrophic forgetting'?

I'm implementing PPO2 reinforcement learning on my self-build tasks and always encounter such situations where the agent seems to be nearly matured then suddenly catstrophically loses its performance and couldn't hold its stable performance. I don't…

python pytorch reinforcement-learning policy-gradient-descent

asked Nov 05 '20 at 08:55

Lewis Liu

1

0

votes

1 answer

Reward not increasing while training a Bipedal System

I am completely new to reinforcement learning and this is my first program in practice. I am trying to train the bipedal system in the OpenAI gym environment using the policy gradient algorithm. However, the reward never changes, either at episode 0…

pytorch reinforcement-learning policy-gradient-descent

asked Jul 25 '20 at 03:06

Atharva Dubey

832
1
8
25

0

votes

0 answers

Gradient calculation for actor in DDPG algorithm

I am experiencing some issues in computing the actor update in DDPG algorithm using Tensorflow 2. The following is the code for both critic and actor updates: with tf.GradientTape() as tape: #persistent=True # compute current action values …

python tensorflow2.0 reinforcement-learning policy-gradient-descent

asked Jun 23 '20 at 11:01

AleB

153
1
3
10

0

votes

1 answer

PPO algorithm converges on only one action

I have taken some reference implementations of PPO algorithm and am trying to create an agent which can play space invaders . Unfortunately from the 2nd trial onwards (after training the actor and critic N Networks for the first time) , the…

artificial-intelligence reinforcement-learning policy-gradient-descent

asked May 03 '20 at 16:59

JAYDEEP GHOSE

11
1

0

votes

1 answer

Policy gradient (REINFORCE) diverging when finding the shortest path in a graph with negative rewards

I want to use the policy gradient to find the shortest path among a group of nodes in a network. The network is represented using a graph with edges labeled with value -1. Now, a path with a negative value closest to 0 is the shortest…

python-3.x tensorflow shortest-path reinforcement-learning policy-gradient-descent

asked Dec 01 '19 at 15:23

abhi

397
3
14

0

votes

1 answer

Loss Policy Gradient - Reinforcement Learning

I am training my network using policy gradient and defining the loss as: self.loss = -tf.reduce_mean(tf.log(OUTPUT_NN)* self.REWARDS)) self.opt = tf.train.AdamOptimizer(self.lr).minimize(self.loss) What I do not understand is that the loss…

tensorflow reinforcement-learning policy-gradient-descent

asked Jul 03 '19 at 10:32

Alex Gomes

29
1
5

0

votes

1 answer

How do we assess each reward in the return in Policy Gradient Methods?

Hi StackOverflow Community, I have a problem with the policy gradient methods in reinforcement learning. In policy gradient methods, we increase/decrease the log probability of an action based on the return (i.e. total rewards) from that step…

reinforcement-learning policy-gradient-descent

asked Jun 10 '19 at 13:25

test

93
1
11

0

votes

1 answer

Trying to implement experience replay in Tensorflow

I am trying to implement experience replay in Tensorflow. The problem I am having is in storing outputs for the models trial and then updating the gradient simultaneously. A couple approaches I have tried are to store the resulting values from…

tensorflow assign policy-gradient-descent

asked Jul 03 '18 at 15:41

Taylor_K

1
1

-1

votes

2 answers

How do you evaluate a trained reinforcement learning agent whether it is trained or not?

I am new to reinforcement learning agent training. I have read about PPO algorithm and used stable baselines library to train an agent using PPO. So my question here is how do I evaluate a trained RL agent. Consider for a regression or…

artificial-intelligence reinforcement-learning montecarlo policy-gradient-descent

asked Oct 30 '19 at 13:24

chink

1,505
3
28
70

-1

votes

1 answer

in stock trading how to masure quantity of stock

I am working on stock market analysis and prediction using machine learning methods, especially with reinforcement learning. I am trying to predict short, long and flat. (buy, hold, sell) . (any suggestion or material is appreciated), currently, I…

artificial-intelligence reinforcement-learning stock policy-gradient-descent

asked Jan 16 '19 at 06:17

parth vadhadiya

119
8

-1

votes

1 answer

Multiclass Sigmoid for DRL action picking

I am working on Deep reinforcement learning problem and I would like to use Sigmoid for my last layer instead of softmax. I am stuck on the what to use for action picking. Specifically, How should I replace the last two line of this code and with…

tensorflow deep-learning reinforcement-learning policy-gradient-descent

asked Aug 27 '18 at 15:15

ahmet hamza emra

580
4
15

Questions tagged [policy-gradient-descent]