Questions tagged [policy-gradient-descent]
44 questions
1
vote
0 answers
How to clamp output of nueron in pytorch
I am using simple nn linear model(20,64,64,2) for deep reinforcement learning. This model I am using to approximate the policy gradients by the PPO algorithm. Hence the output layer gives 2 values, which are mean and standard deviation. These…

Dekay
- 11
- 2
1
vote
0 answers
Convergence guarantee of Policy Gradient with function approximation
Is there any convergence proof of the Policy Gradient algorithm with "general" value/Q function approximation ?
Seminal papers (Sutton1999 & Tsitsiklis1999) prove the theorem using a compatibility assumption (i.e. the Q-function approximation is…

arnaud
- 11
- 2
1
vote
0 answers
The gradients are all 0 when backwards and the parameters didnot change at all
I implement the policy gradient method to learn the unknown function( which is a 10 loop sum function here), but the model did not update. The learning data is input and the target. func2 include the MLP model which to predict the target number. The…

Liuyang Gao
- 11
- 1
1
vote
2 answers
why policy gradient theorm uses Q function in Reinforcment learning?
The introduction of policy gradients algorithm states that policy algorithms are better because it directly optimizes policy without the need of calculating Q first. Why do they use Q in the equation then? How do they calculate the whole thing…

swapnil
- 21
- 1
- 8
1
vote
1 answer
Difficult reinforcement learning query
I'm struggling to figure out how I want to do this so I hope someone here may offer some guidance.
Scenario - I have a 10 character string, lets call it the DNA, made up of the following characters:
F
-
+
[
]
X
for example DNA = ['F', 'F', '+',…

Izak Joubert
- 906
- 11
- 29
1
vote
1 answer
How does score function help in policy gradient?
I'm trying to learn policy gradient methods for reinforcement learning but I stuck at the score function part.
While searching for maximum or minimum points in a function, we take the derivative and set it to zero, then look for the points that…

test
- 93
- 1
- 11
0
votes
1 answer
TypeError: tuple indices must be integers or slices, not NoneType
I need help regarding a TypeError when I'm trying to pass input in the Neural Network defined as:
env = gym.make("CartPole-v1",render_mode="rgb_array")
obs = env.reset()
n_inputs = env.observation_space.shape[0]
model = tf.keras.Sequential([
…

Ravi Sharma
- 3
- 1
0
votes
1 answer
Attribute error in PPO algorithm for Cartpole gym environment
I'm trying to run the code from here (Github link on this page): https://keras.io/examples/rl/ppo_cartpole/
I'm getting an attribute error in the training section from observation = observation.reshape(1,-1) which says "'tuple' object has no…

Max
- 13
- 2
0
votes
0 answers
policy gradient with binary action space
I am training the agent using policy gradient method. After training, the agent would always choose one of two actions.
Below is my code
action = tf.where(self.model(state)[:,-1] > 0.5, 1., 0.)
reward = self.get_rewards(action, state)
with…

user1292919
- 193
- 8
0
votes
0 answers
When to update weights in RL model
I am building a chatbot model using Policy gradient Reinforcement Learning. The agent is a Seq2seq LSTM based model. I am using cross entropy loss. Do I need to update the weights of the model after every input while training RL model?
I am…
0
votes
0 answers
MADDPG does not learn anything
I have a continuous problem and I should solve it with multi agent deep deterministic policy gradient (MADDPG). My environment has 7 states and 3 actions. the range of 2 of actions are between [0,1] and the range of one of the actions is between…

at-dgh
- 1
- 1
0
votes
1 answer
Parallel environments in Pong keep ending up in the same state despite random actions being taken
Hi I am trying to use the SubprocVecEnv to run 8 parallel Pong environment instances. I tried testing the state transitions using random actions but after 15 steps (with random left or right actions), the states of all the environments are the same.…

Swami
- 25
- 4
0
votes
1 answer
python policy gradient reinforcement learning with continous action space is not working
i am trying to learn an agent to navigate to a target in my custom environment.
The agent is learning with a neural net (2 hidden Dense layer, one dropout and one output layer of dimension 4). As input nodes the agent uses a sensor which measures…

Viktoria
- 1
0
votes
1 answer
Action masking for continuous action space in reinforcement learning
Is there a way to model action masking for continuous action spaces? I want to model economic problems with reinforcement learning. These problems often have continuous action and state spaces. In addition, the state often influences what actions…

matthias
- 48
- 5
0
votes
0 answers
One back-propagation pass in keras
I would like to train a neural network based on policy gradient method. The training involves finding the gradient of a user-defined loss (one back-propagation pass). I know gradient is automatically done during compiling as…

mohamed
- 29
- 5