Ray RLlib is an open-source Python library for Reinforcement Learning. Use with applicable framework tags, such as TensorFlow or PyTorch.
Questions tagged [rllib]
105 questions
1
vote
0 answers
Correct approach to improve/retrain an offiline model
I have a recommendation system that was trained using Behavior Cloning (BC) with offline data generated using a supervised learning model converted to batch format using the approach described here. Currently, the model is exploring using an…

Felipe Leite Antunes
- 101
- 6
1
vote
0 answers
How can I get the episode info dict from the on_sample_end callback?
I need to get the episode info dict from the on_sample_end callback so that I can display some custom metrics every time a rollout has finished. How can I do that?
Ray Version: 0.7.3
Thank you in advance!

Antonio Domínguez
- 51
- 3
1
vote
1 answer
rllib - obtain TensorFlow or PyTorch model output from checkpoint
I'd like to use the rllib trained policy model in a different code where I need to track which action is generated for specific input states. Using a standard TensorFlow or PyTorch (preferred) network model would provide that flexibility but I can't…

lakehopper
- 11
- 2
1
vote
1 answer
Creating custom MA envrionment
I am looking for some guidance to building a multi agent dummy example. I've been trying to work through Rllib documentation , but I think I haven't understood the approach of how to create my own multi-agent environment.
I'd like to have several…

bastiano14
- 11
- 2
1
vote
2 answers
Correct way to pass in custom model parameters for RLLib model?
I have a basic custom model that is essentially just a copy-paste of the default RLLib fully connected model (https://github.com/ray-project/ray/blob/master/rllib/models/tf/fcnet.py) and I'm passing in custom model parameters through a config file…

Yuerno
- 751
- 1
- 8
- 27
1
vote
1 answer
How to set up rllib multi-agent PPO?
I have a very simple multi-agent environment set up for use with ray.rllib, and I'm trying to run a simple baseline test of a PPO vs. Random Policy training scenario as follows:
register_env("my_env", lambda _: MyEnv(num_agents=2))
mock =…

deepmindz
- 598
- 1
- 6
- 14
1
vote
0 answers
RLlib changing the shape of observation adding [None] to the shape tuple
RLlib (version 0.7.3) is provided with the observation shape of Box(10, 3), which I wanted to use with a FCN agent. But the library seems to add another dimension to it.
Because of this addition, RLlib tries to use a vision network for the…

Parth Jaggi
- 43
- 6
1
vote
1 answer
Is it possible to specify "episodes_this_iter" with the ray Tune search algorithm?
I'm new to programming/ray and have a simple question about which parameters can be specified when using Ray Tune. In particular, the ray tune documentation says that all of the auto-filled fields (steps_this_iter, episodes_this_iter, etc.) can be…

sbrand
- 11
- 1
1
vote
1 answer
'Observation outside expected value range' error when running example 'traffic_light_grid.py'
I'm trying to learn how to use FLOW. When I was trying to run 'flow/example/rllib/traffic_light_grid.py', it kept returning 'Observation outside expected value range' errors, until it reached the maximum error number. I have no idea why only this…

Ming
- 11
- 1
1
vote
0 answers
How to configure batches for LSTM with Marwil in rllib
I'm trying to train an lstm policy using Marwil in rllib. I could not find any examples for how to set up the batches for this problem. I train a marwil model just fine if it does not have an lstm component by using the instructions…

Daniel Breen
- 71
- 3
0
votes
0 answers
Error while testing keras_custom_model.py of ray rllib
I'm encountering a persistent issue while working with the Tune library in Ray RLLIB with Python on a Windows system. I'm attempting to run a test script from the library titled: custom_keras_model.py
(raylet) [2023-08-22 15:01:41,041 E 10616 15108]…

Nadir Bakyac
- 1
- 1
0
votes
1 answer
Difficulty Implementing DQN for Gym's Taxi-v3 Problem
I've been working on solving the Gym Taxi-v3 problem using reinforcement learning algorithms. Initially, I applied tabular Q-learning and after 10,000 training iterations, the algorithm achieved a mean reward of 8.x with 100% success rate, which was…

Aaron
- 11
- 3
0
votes
0 answers
RLLib EpiesodeV2 missing method to query latest dict returned from step?
The Episode class provided the method last_info_for to pull the info dict return to the agent at the latest step. Now EpisodeV2 doesn't have such a method, and subsequently returns errors like 'EpisodeV2' object has no attribute 'last_info_for'.
Is…

Victor M
- 603
- 4
- 22
0
votes
0 answers
Setting initial iterations in Ray Tune's implementation of BOHB
I am trying to use Ray Tune's implementation of BOHB to hyperparameter tune a PPO model. If I set the amount of iterations to e.g. 100 it works fine - however it already samples new hyperparameter values after only one iteration of a sample.…

Jakob Sejten
- 31
- 2
0
votes
0 answers
High Runtime Experiments for Executing Sample Codes in the 'examples' Folder
I am new to Flow and Ray[rllib], and I would like to request your assistance in sharing your estimated runtime experiences for the example codes provided in the 'examples' folder of Flow, considering your system properties.
For instance, when I run…

FahimSh87
- 1
- 1