Questions tagged [policy-gradient-descent]

44 questions
1
vote
0 answers

How to clamp output of nueron in pytorch

I am using simple nn linear model(20,64,64,2) for deep reinforcement learning. This model I am using to approximate the policy gradients by the PPO algorithm. Hence the output layer gives 2 values, which are mean and standard deviation. These…
1
vote
0 answers

Convergence guarantee of Policy Gradient with function approximation

Is there any convergence proof of the Policy Gradient algorithm with "general" value/Q function approximation ? Seminal papers (Sutton1999 & Tsitsiklis1999) prove the theorem using a compatibility assumption (i.e. the Q-function approximation is…
1
vote
0 answers

The gradients are all 0 when backwards and the parameters didnot change at all

I implement the policy gradient method to learn the unknown function( which is a 10 loop sum function here), but the model did not update. The learning data is input and the target. func2 include the MLP model which to predict the target number. The…
1
vote
2 answers

why policy gradient theorm uses Q function in Reinforcment learning?

The introduction of policy gradients algorithm states that policy algorithms are better because it directly optimizes policy without the need of calculating Q first. Why do they use Q in the equation then? How do they calculate the whole thing…
swapnil
  • 21
  • 1
  • 8
1
vote
1 answer

Difficult reinforcement learning query

I'm struggling to figure out how I want to do this so I hope someone here may offer some guidance. Scenario - I have a 10 character string, lets call it the DNA, made up of the following characters: F - + [ ] X for example DNA = ['F', 'F', '+',…
Izak Joubert
  • 906
  • 11
  • 29
1
vote
1 answer

How does score function help in policy gradient?

I'm trying to learn policy gradient methods for reinforcement learning but I stuck at the score function part. While searching for maximum or minimum points in a function, we take the derivative and set it to zero, then look for the points that…
test
  • 93
  • 1
  • 11
0
votes
1 answer

TypeError: tuple indices must be integers or slices, not NoneType

I need help regarding a TypeError when I'm trying to pass input in the Neural Network defined as: env = gym.make("CartPole-v1",render_mode="rgb_array") obs = env.reset() n_inputs = env.observation_space.shape[0] model = tf.keras.Sequential([ …
0
votes
1 answer

Attribute error in PPO algorithm for Cartpole gym environment

I'm trying to run the code from here (Github link on this page): https://keras.io/examples/rl/ppo_cartpole/ I'm getting an attribute error in the training section from observation = observation.reshape(1,-1) which says "'tuple' object has no…
0
votes
0 answers

policy gradient with binary action space

I am training the agent using policy gradient method. After training, the agent would always choose one of two actions. Below is my code action = tf.where(self.model(state)[:,-1] > 0.5, 1., 0.) reward = self.get_rewards(action, state) with…
0
votes
0 answers

When to update weights in RL model

I am building a chatbot model using Policy gradient Reinforcement Learning. The agent is a Seq2seq LSTM based model. I am using cross entropy loss. Do I need to update the weights of the model after every input while training RL model? I am…
0
votes
0 answers

MADDPG does not learn anything

I have a continuous problem and I should solve it with multi agent deep deterministic policy gradient (MADDPG). My environment has 7 states and 3 actions. the range of 2 of actions are between [0,1] and the range of one of the actions is between…
0
votes
1 answer

Parallel environments in Pong keep ending up in the same state despite random actions being taken

Hi I am trying to use the SubprocVecEnv to run 8 parallel Pong environment instances. I tried testing the state transitions using random actions but after 15 steps (with random left or right actions), the states of all the environments are the same.…
0
votes
1 answer

python policy gradient reinforcement learning with continous action space is not working

i am trying to learn an agent to navigate to a target in my custom environment. The agent is learning with a neural net (2 hidden Dense layer, one dropout and one output layer of dimension 4). As input nodes the agent uses a sensor which measures…
0
votes
1 answer

Action masking for continuous action space in reinforcement learning

Is there a way to model action masking for continuous action spaces? I want to model economic problems with reinforcement learning. These problems often have continuous action and state spaces. In addition, the state often influences what actions…
0
votes
0 answers

One back-propagation pass in keras

I would like to train a neural network based on policy gradient method. The training involves finding the gradient of a user-defined loss (one back-propagation pass). I know gradient is automatically done during compiling as…