Highest Voted 'policy-gradient-descent' Questions

1

vote

0 answers

How to clamp output of nueron in pytorch

I am using simple nn linear model(20,64,64,2) for deep reinforcement learning. This model I am using to approximate the policy gradients by the PPO algorithm. Hence the output layer gives 2 values, which are mean and standard deviation. These…

asked Apr 10 '21 at 22:41

Dekay

11
2

1

vote

0 answers

Convergence guarantee of Policy Gradient with function approximation

Is there any convergence proof of the Policy Gradient algorithm with "general" value/Q function approximation ? Seminal papers (Sutton1999 & Tsitsiklis1999) prove the theorem using a compatibility assumption (i.e. the Q-function approximation is…

reinforcement-learning function-approximation policy-gradient-descent

asked Dec 18 '20 at 11:42

arnaud

11
2

1

vote

0 answers

The gradients are all 0 when backwards and the parameters didnot change at all

I implement the policy gradient method to learn the unknown function( which is a 10 loop sum function here), but the model did not update. The learning data is input and the target. func2 include the MLP model which to predict the target number. The…

machine-learning gradient-descent mlp policy-gradient-descent

asked Nov 20 '19 at 03:49

Liuyang Gao

11
1

1

vote

2 answers

why policy gradient theorm uses Q function in Reinforcment learning?

The introduction of policy gradients algorithm states that policy algorithms are better because it directly optimizes policy without the need of calculating Q first. Why do they use Q in the equation then? How do they calculate the whole thing…

reinforcement-learning policy-gradient-descent

asked Sep 19 '19 at 21:47

swapnil

21
1
8

1

vote

1 answer

Difficult reinforcement learning query

I'm struggling to figure out how I want to do this so I hope someone here may offer some guidance. Scenario - I have a 10 character string, lets call it the DNA, made up of the following characters: F - + [ ] X for example DNA = ['F', 'F', '+',…

reinforcement-learning policy-gradient-descent

asked Jul 25 '19 at 09:33

Izak Joubert

906
11
29

1

vote

1 answer

How does score function help in policy gradient?

I'm trying to learn policy gradient methods for reinforcement learning but I stuck at the score function part. While searching for maximum or minimum points in a function, we take the derivative and set it to zero, then look for the points that…

reinforcement-learning policy-gradient-descent

asked May 24 '19 at 16:45

test

93
1
11

0

votes

1 answer

TypeError: tuple indices must be integers or slices, not NoneType

I need help regarding a TypeError when I'm trying to pass input in the Neural Network defined as: env = gym.make("CartPole-v1",render_mode="rgb_array") obs = env.reset() n_inputs = env.observation_space.shape[0] model = tf.keras.Sequential([ …

neural-network tensor reinforcement-learning tf.keras policy-gradient-descent

asked May 01 '23 at 08:09

Ravi Sharma

3
1

0

votes

1 answer

Attribute error in PPO algorithm for Cartpole gym environment

I'm trying to run the code from here (Github link on this page): https://keras.io/examples/rl/ppo_cartpole/ I'm getting an attribute error in the training section from observation = observation.reshape(1,-1) which says "'tuple' object has no…

python tensorflow tf.keras openai-gym policy-gradient-descent

asked Feb 20 '23 at 15:04

Max

13
2

0

votes

0 answers

policy gradient with binary action space

I am training the agent using policy gradient method. After training, the agent would always choose one of two actions. Below is my code action = tf.where(self.model(state)[:,-1] > 0.5, 1., 0.) reward = self.get_rewards(action, state) with…

tensorflow reinforcement-learning policy-gradient-descent

asked Feb 17 '23 at 11:50

user1292919

193
8

0

votes

0 answers

When to update weights in RL model

I am building a chatbot model using Policy gradient Reinforcement Learning. The agent is a Seq2seq LSTM based model. I am using cross entropy loss. Do I need to update the weights of the model after every input while training RL model? I am…

lstm chatbot reinforcement-learning seq2seq policy-gradient-descent

asked Jan 30 '23 at 21:21

user21113829

1

0

votes

0 answers

MADDPG does not learn anything

I have a continuous problem and I should solve it with multi agent deep deterministic policy gradient (MADDPG). My environment has 7 states and 3 actions. the range of 2 of actions are between [0,1] and the range of one of the actions is between…

deep-learning reinforcement-learning q-learning policy-gradient-descent actor-critics

asked Oct 30 '22 at 14:09

at-dgh

1
1

0

votes

1 answer

Parallel environments in Pong keep ending up in the same state despite random actions being taken

Hi I am trying to use the SubprocVecEnv to run 8 parallel Pong environment instances. I tried testing the state transitions using random actions but after 15 steps (with random left or right actions), the states of all the environments are the same.…

reinforcement-learning openai-gym pong policy-gradient-descent

asked Apr 01 '22 at 08:26

Swami

25
4

0

votes

1 answer

python policy gradient reinforcement learning with continous action space is not working

i am trying to learn an agent to navigate to a target in my custom environment. The agent is learning with a neural net (2 hidden Dense layer, one dropout and one output layer of dimension 4). As input nodes the agent uses a sensor which measures…

python navigation reinforcement-learning montecarlo policy-gradient-descent

asked Mar 31 '22 at 18:04

Viktoria

1

0

votes

1 answer

Action masking for continuous action space in reinforcement learning

Is there a way to model action masking for continuous action spaces? I want to model economic problems with reinforcement learning. These problems often have continuous action and state spaces. In addition, the state often influences what actions…

reinforcement-learning openai-gym policy-gradient-descent sac

asked Mar 11 '22 at 10:39

matthias

48
5

0

votes

0 answers

One back-propagation pass in keras

I would like to train a neural network based on policy gradient method. The training involves finding the gradient of a user-defined loss (one back-propagation pass). I know gradient is automatically done during compiling as…

tensorflow keras backpropagation policy-gradient-descent

asked Sep 20 '21 at 13:38

mohamed

29
5

Questions tagged [policy-gradient-descent]