Questions tagged [actor-critics]
13 questions
1
vote
0 answers
How can i design the architecture of a reinforcement learning actor that outputs the mean and variance for different variables?
im going to implement a Reinforcement learning PPO algorithm on a problem that requires a decision that is composed by 4 continuous variables: [a1,a2,a3,a4]. For this, I want to create an actor that can provide the mean and variance of a Gaussian…
0
votes
0 answers
how to pass state to actor network?
let's say the state i'm expecting to pass to actor network from custom env is just [0. 0. 0. 0. 0.].
but i am getting this:
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0.…

5we21n
- 1
0
votes
0 answers
Getting always the same action on an A2C from stable_baselines3
I'm quite new to RL and have been trying to train an A2C model from stable_baselines3 to derive an integer sequence based on 3 other input sequences of floats. I have a custom gym environment that computes the agent rewards based on the actions…

Jesuspc
- 1,664
- 10
- 25
0
votes
0 answers
Custom Policy stable-baselines3
I'm trying to create a custom Policy for A2C with stable-baselines3, but I'm stuck. I'm using a MultiBinary observation space (80x80 grid) and continuous actions.
self.action_space = Box(
low=-1.0, high=1.0, shape=(4,),…

Claudiu Filip
- 19
- 4
0
votes
1 answer
Problems using RL algorithm PPO in Lunar Lander-v2
In algorithm PPO, a ratio needs to be calculated as ratios = torch.exp(new_probs-old_probs) which is the ratio between the probability of action under the current policy divided by the probability of the action under the previous policy.
But in my…

cxzhou
- 3
- 4
0
votes
0 answers
Problem with the gradient of the actor using linear function approximation in RL
I have a problem updating theta (the weights vector for the actor in an actor critic algorithm). I know the gradient of ln(pi(a|s,theta) = x(s,a) - \sum_b(pi(b|s,theta)*x(s,b) where the index b represent each of the possible actions. The result of…

Mateo
- 1
- 2
0
votes
0 answers
actor critic method for multiple continuous variables
I am using the code below (adapted from https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/rl/ipynb/actor_critic_cartpole.ipynb) to try and calibrate two continuous variables. The variables are run through a dummy…

rbtlm640
- 79
- 5
0
votes
0 answers
How to combine A2C with BPTT?
I'm having a little difficulty understanding how I can apply backpropagation through time to the A2C method, or any reinforcement learning method for that matter.
As I understand it, BPTT conceptually unrolls a recurrent network and performs a…

Telf
- 1
0
votes
0 answers
How to make actor-critic learning stabilize when the actor loss increases faster than the critic loss can go down?
Im running this in Tensorflow 2.7.2 I found this method for training an actor-critic algorithm on the cartpole task. I wanted to see if and how learning could occur after overfitting it on some data.
So I want to train until the losses do not…

turkishelehant
- 13
- 2
0
votes
0 answers
MADDPG does not learn anything
I have a continuous problem and I should solve it with multi agent deep deterministic policy gradient (MADDPG). My environment has 7 states and 3 actions. the range of 2 of actions are between [0,1] and the range of one of the actions is between…

at-dgh
- 1
- 1
0
votes
0 answers
Make a prediction using Actor-Critic shared model from timeseries data using Python
I have been able to train and test a shared RL model that produces a Actor.h5 and Critic.h5 file, as well as a json containing the parameters for training.
I am now at the stage where I would like to make a prediction on the next best action for the…
-1
votes
0 answers
How can I justify if the implementation of RL model is correct?
I am implementing actor-critic reinforcement learning algorithm and I don't know how can I justify if it's correctly implemented? I am using tensorflow and matlab for the environment. Feel free to ask me if you need further details.

5we21n
- 1
-1
votes
2 answers
Where is source for tensorflow gym environments implementation
I need to implement custom tensorflow gym environment to use it with tf agents.
Is there a code on Github for "standard" gym environment? Eg cart pole
Please note this is tensorflow specific question not openAi

Boppity Bop
- 9,613
- 13
- 72
- 151