Questions tagged [actor-critics]

13 questions
1
vote
0 answers

How can i design the architecture of a reinforcement learning actor that outputs the mean and variance for different variables?

im going to implement a Reinforcement learning PPO algorithm on a problem that requires a decision that is composed by 4 continuous variables: [a1,a2,a3,a4]. For this, I want to create an actor that can provide the mean and variance of a Gaussian…
0
votes
0 answers

how to pass state to actor network?

let's say the state i'm expecting to pass to actor network from custom env is just [0. 0. 0. 0. 0.]. but i am getting this: [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0.…
5we21n
  • 1
0
votes
0 answers

Getting always the same action on an A2C from stable_baselines3

I'm quite new to RL and have been trying to train an A2C model from stable_baselines3 to derive an integer sequence based on 3 other input sequences of floats. I have a custom gym environment that computes the agent rewards based on the actions…
0
votes
0 answers

Custom Policy stable-baselines3

I'm trying to create a custom Policy for A2C with stable-baselines3, but I'm stuck. I'm using a MultiBinary observation space (80x80 grid) and continuous actions. self.action_space = Box( low=-1.0, high=1.0, shape=(4,),…
0
votes
1 answer

Problems using RL algorithm PPO in Lunar Lander-v2

In algorithm PPO, a ratio needs to be calculated as ratios = torch.exp(new_probs-old_probs) which is the ratio between the probability of action under the current policy divided by the probability of the action under the previous policy. But in my…
0
votes
0 answers

Problem with the gradient of the actor using linear function approximation in RL

I have a problem updating theta (the weights vector for the actor in an actor critic algorithm). I know the gradient of ln(pi(a|s,theta) = x(s,a) - \sum_b(pi(b|s,theta)*x(s,b) where the index b represent each of the possible actions. The result of…
0
votes
0 answers

actor critic method for multiple continuous variables

I am using the code below (adapted from https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/rl/ipynb/actor_critic_cartpole.ipynb) to try and calibrate two continuous variables. The variables are run through a dummy…
0
votes
0 answers

How to combine A2C with BPTT?

I'm having a little difficulty understanding how I can apply backpropagation through time to the A2C method, or any reinforcement learning method for that matter. As I understand it, BPTT conceptually unrolls a recurrent network and performs a…
0
votes
0 answers

How to make actor-critic learning stabilize when the actor loss increases faster than the critic loss can go down?

Im running this in Tensorflow 2.7.2 I found this method for training an actor-critic algorithm on the cartpole task. I wanted to see if and how learning could occur after overfitting it on some data. So I want to train until the losses do not…
0
votes
0 answers

MADDPG does not learn anything

I have a continuous problem and I should solve it with multi agent deep deterministic policy gradient (MADDPG). My environment has 7 states and 3 actions. the range of 2 of actions are between [0,1] and the range of one of the actions is between…
0
votes
0 answers

Make a prediction using Actor-Critic shared model from timeseries data using Python

I have been able to train and test a shared RL model that produces a Actor.h5 and Critic.h5 file, as well as a json containing the parameters for training. I am now at the stage where I would like to make a prediction on the next best action for the…
-1
votes
0 answers

How can I justify if the implementation of RL model is correct?

I am implementing actor-critic reinforcement learning algorithm and I don't know how can I justify if it's correctly implemented? I am using tensorflow and matlab for the environment. Feel free to ask me if you need further details.
-1
votes
2 answers

Where is source for tensorflow gym environments implementation

I need to implement custom tensorflow gym environment to use it with tf agents. Is there a code on Github for "standard" gym environment? Eg cart pole Please note this is tensorflow specific question not openAi
Boppity Bop
  • 9,613
  • 13
  • 72
  • 151