Questions tagged [rllib]

Ray RLlib is an open-source Python library for Reinforcement Learning. Use with applicable framework tags, such as TensorFlow or PyTorch.

105 questions
2
votes
0 answers

Training ray.rllib algorithm with vectorized environments

I am working with ray.rllib and I am stuck with using a static method (line40) of vectorizing my custom environment and training it using PPOTrainer(). I use the existing_envs parameter and hand over a list of gym.envs which I manually create. Is…
2
votes
1 answer

Using graph-mode in Ray RLlib triggers errors when a tf.keras.model.predict() function is called in PPOTFPolicy

I'm using Ray RLlib to train a PPO agent with TWO modifications on the PPOTFPolicy. I added a mixin class (say "Recal") to the "mixins" parameter in "build_tf_policy()". This way, the PPOTFPolicy would subclass my "Recal" class and have the access…
WPXP
  • 21
  • 1
2
votes
1 answer

Ray[RLlib] Custom Action Distribution (TorchDeterministic)

We know that in the case of a Box (continuous action) Action Space, the corresponding Action Distribution is DiagGaussian (probability distribution). However, I want to use TorchDeterministic (Action Distribution that returns the input values…
2
votes
0 answers

The actor died unexpectedly before finishing this task ( Ray1.7.0 , Sagemaker )

I am running Ray rllib on sagemaker with 8 cores CPU using the sagemaker_rl library, I set num_workers to 7. After a long execution I face The actor died unexpectedly before finishing this task class MyLauncher(SageMakerRayLauncher): def…
2
votes
0 answers

Using RLlib for a custom multi-agent gym environment

I am trying to set up a custom multi-agent environment using RLlib, but either I am using the once available online or I am making one, I am being encountered by the same errors as mentioned below. Please help me out. I have installed whatever they…
2
votes
0 answers

Trained well with DQN, but not learning with A2C

I've used Ray RLlib's DQN to train in my custom simulator. It usually produced good results after 15 million steps. After playing around with DQN for a while, I'm now trying to train A2C in the simulator. However, it's not even close to converging…
2
votes
3 answers

Save episode rewards in ray.tune

I am training several agents with PPO algorithms in a multi-agent environment using rllib/ray. I am using the ray.tune() command to train the agents and then loading the training data from ~/ray_results. This data contains the actions chosen by the…
mat123a
  • 21
  • 1
2
votes
1 answer

Error: `callbacks` must be a callable method that returns a subclass of DefaultCallbacks, got

when I run some codes(DDPG - Deep Deterministic Policy Gradient), this error occurred: ValueError: callbacks must be a callable method that returns a subclass of DefaultCallbacks, got my code is…
Peter Kim
  • 65
  • 4
2
votes
0 answers

Ray RLlib: Why is the learn throughput decreasing in DQN Training?

Is it normal to have the learn throughput decrease and the learn time increase as a Dueling DDQN agent gets trained? A 7X increase in learn time after a few hours of training is pretty significant, is this something that you will expect to see? My…
Nyxynyx
  • 61,411
  • 155
  • 482
  • 830
2
votes
1 answer

AWS SageMaker RL with ray: ray.tune.error.TuneError: No trainable specified

I have a training script based on the AWS SageMaker RL example rl_network_compression_ray_custom but changed the env to make a basic gym env Asteroids-v0 (installing dependencies at main entrypoint to the training script). When I run the fit on the…
2
votes
0 answers

Cannot define custom metrics in ray

I'm using a framework called FLOW RL. It enables me to use rllib and ray for my RL algorithm. I have been trying to plot non learning data on tensorboard. Following ray documentation ( link ), I have tried to add custom metrics. Therefore, I need to…
2
votes
1 answer

How do we print action distributions in RLlib during training?

I'm trying to print action distributions at the end of each episode to see what my agent is doing. I've attempted to do put this is rock_paper_scissors_multiagent.py by including the following method def on_episode_end(info): episode =…
2
votes
1 answer

Atari score vs reward in rllib DQN implementation

I'm trying to replicate DQN scores for Breakout using RLLib. After 5M steps the average reward is 2.0 while the known score for Breakout using DQN is 100+. I'm wondering if this is because of reward clipping and therefore actual reward does not…
Shital Shah
  • 63,284
  • 17
  • 238
  • 185
1
vote
0 answers

Implement PPO agent using RAY on custom Gym environment

I've been interested in Reinforcement Learning for a while. I've made a simple PPO agent, using Stable Baselines 3, PyTorch and Gym to trade Bitcoin using Binance data. This was relatively easy and straight forward to setup, get it running and get a…
1
vote
1 answer

How to end episodes after 200 steps in Ray Tune (tune.run()) using a PPO model with torch

I'm using the following code to import a custom environement and then train on it: from ray.tune.registry import register_env import ray from ray import air, tune from ray.rllib.algorithms.ppo import PPO from gym_env.cube_env import…
Dawid
  • 13
  • 2