Highest Voted 'actor-critics' Questions

1

vote

0 answers

How can i design the architecture of a reinforcement learning actor that outputs the mean and variance for different variables?

im going to implement a Reinforcement learning PPO algorithm on a problem that requires a decision that is composed by 4 continuous variables: [a1,a2,a3,a4]. For this, I want to create an actor that can provide the mean and variance of a Gaussian…

asked Nov 23 '22 at 20:24

Daniel Rangel Martinez

15
6

0

votes

0 answers

how to pass state to actor network?

let's say the state i'm expecting to pass to actor network from custom env is just [0. 0. 0. 0. 0.]. but i am getting this: [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0. 0. 0. 0.] [0. 0.…

python tensorflow actor-critics

asked Aug 16 '23 at 05:38

5we21n

1

0

votes

0 answers

Getting always the same action on an A2C from stable_baselines3

I'm quite new to RL and have been trying to train an A2C model from stable_baselines3 to derive an integer sequence based on 3 other input sequences of floats. I have a custom gym environment that computes the agent rewards based on the actions…

python reinforcement-learning q-learning stable-baselines actor-critics

asked Aug 01 '23 at 16:59

Jesuspc

1,664
10
25

0

votes

0 answers

Custom Policy stable-baselines3

I'm trying to create a custom Policy for A2C with stable-baselines3, but I'm stuck. I'm using a MultiBinary observation space (80x80 grid) and continuous actions. self.action_space = Box( low=-1.0, high=1.0, shape=(4,),…

python reinforcement-learning policy stable-baselines actor-critics

asked Apr 28 '23 at 11:52

Claudiu Filip

19
4

0

votes

1 answer

Problems using RL algorithm PPO in Lunar Lander-v2

In algorithm PPO, a ratio needs to be calculated as ratios = torch.exp(new_probs-old_probs) which is the ratio between the probability of action under the current policy divided by the probability of the action under the previous policy. But in my…

reinforcement-learning openai-gym actor-critics

asked Mar 30 '23 at 11:06

cxzhou

3
4

0

votes

0 answers

Problem with the gradient of the actor using linear function approximation in RL

I have a problem updating theta (the weights vector for the actor in an actor critic algorithm). I know the gradient of ln(pi(a|s,theta) = x(s,a) - \sum_b(pi(b|s,theta)*x(s,b) where the index b represent each of the possible actions. The result of…

machine-learning math reinforcement-learning gradient-descent actor-critics

asked Mar 14 '23 at 13:21

Mateo

1
2

0

votes

0 answers

actor critic method for multiple continuous variables

I am using the code below (adapted from https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/rl/ipynb/actor_critic_cartpole.ipynb) to try and calibrate two continuous variables. The variables are run through a dummy…

python tensorflow keras reinforcement-learning actor-critics

asked Mar 08 '23 at 21:31

rbtlm640

79
5

0

votes

0 answers

How to combine A2C with BPTT?

I'm having a little difficulty understanding how I can apply backpropagation through time to the A2C method, or any reinforcement learning method for that matter. As I understand it, BPTT conceptually unrolls a recurrent network and performs a…

reinforcement-learning backpropagation back-propagation-through-time actor-critics

asked Feb 11 '23 at 15:56

Telf

1

0

votes

0 answers

How to make actor-critic learning stabilize when the actor loss increases faster than the critic loss can go down?

Im running this in Tensorflow 2.7.2 I found this method for training an actor-critic algorithm on the cartpole task. I wanted to see if and how learning could occur after overfitting it on some data. So I want to train until the losses do not…

python reinforcement-learning gradienttape overfitting-underfitting actor-critics

asked Jan 27 '23 at 16:23

turkishelehant

13
2

0

votes

0 answers

MADDPG does not learn anything

I have a continuous problem and I should solve it with multi agent deep deterministic policy gradient (MADDPG). My environment has 7 states and 3 actions. the range of 2 of actions are between [0,1] and the range of one of the actions is between…

deep-learning reinforcement-learning q-learning policy-gradient-descent actor-critics

asked Oct 30 '22 at 14:09

at-dgh

1
1

0

votes

0 answers

Make a prediction using Actor-Critic shared model from timeseries data using Python

I have been able to train and test a shared RL model that produces a Actor.h5 and Critic.h5 file, as well as a json containing the parameters for training. I am now at the stage where I would like to make a prediction on the next best action for the…

python keras prediction actor-critics

asked Oct 14 '22 at 12:43

Ben Wilson

1

-1

votes

0 answers

How can I justify if the implementation of RL model is correct?

I am implementing actor-critic reinforcement learning algorithm and I don't know how can I justify if it's correctly implemented? I am using tensorflow and matlab for the environment. Feel free to ask me if you need further details.

python matlab tensorflow reinforcement-learning actor-critics

asked Aug 22 '23 at 15:16

5we21n

1

-1

votes

2 answers

Where is source for tensorflow gym environments implementation

I need to implement custom tensorflow gym environment to use it with tf agents. Is there a code on Github for "standard" gym environment? Eg cart pole Please note this is tensorflow specific question not openAi

tensorflow reinforcement-learning actor-critics

asked Sep 05 '22 at 16:47

Boppity Bop

9,613
13
72
151

Questions tagged [actor-critics]