Questions tagged [rllib]

Ray RLlib is an open-source Python library for Reinforcement Learning. Use with applicable framework tags, such as TensorFlow or PyTorch.

105 questions
1
vote
0 answers

Correct approach to improve/retrain an offiline model

I have a recommendation system that was trained using Behavior Cloning (BC) with offline data generated using a supervised learning model converted to batch format using the approach described here. Currently, the model is exploring using an…
1
vote
0 answers

How can I get the episode info dict from the on_sample_end callback?

I need to get the episode info dict from the on_sample_end callback so that I can display some custom metrics every time a rollout has finished. How can I do that? Ray Version: 0.7.3 Thank you in advance!
1
vote
1 answer

rllib - obtain TensorFlow or PyTorch model output from checkpoint

I'd like to use the rllib trained policy model in a different code where I need to track which action is generated for specific input states. Using a standard TensorFlow or PyTorch (preferred) network model would provide that flexibility but I can't…
lakehopper
  • 11
  • 2
1
vote
1 answer

Creating custom MA envrionment

I am looking for some guidance to building a multi agent dummy example. I've been trying to work through Rllib documentation , but I think I haven't understood the approach of how to create my own multi-agent environment. I'd like to have several…
bastiano14
  • 11
  • 2
1
vote
2 answers

Correct way to pass in custom model parameters for RLLib model?

I have a basic custom model that is essentially just a copy-paste of the default RLLib fully connected model (https://github.com/ray-project/ray/blob/master/rllib/models/tf/fcnet.py) and I'm passing in custom model parameters through a config file…
Yuerno
  • 751
  • 1
  • 8
  • 27
1
vote
1 answer

How to set up rllib multi-agent PPO?

I have a very simple multi-agent environment set up for use with ray.rllib, and I'm trying to run a simple baseline test of a PPO vs. Random Policy training scenario as follows: register_env("my_env", lambda _: MyEnv(num_agents=2)) mock =…
deepmindz
  • 598
  • 1
  • 6
  • 14
1
vote
0 answers

RLlib changing the shape of observation adding [None] to the shape tuple

RLlib (version 0.7.3) is provided with the observation shape of Box(10, 3), which I wanted to use with a FCN agent. But the library seems to add another dimension to it. Because of this addition, RLlib tries to use a vision network for the…
1
vote
1 answer

Is it possible to specify "episodes_this_iter" with the ray Tune search algorithm?

I'm new to programming/ray and have a simple question about which parameters can be specified when using Ray Tune. In particular, the ray tune documentation says that all of the auto-filled fields (steps_this_iter, episodes_this_iter, etc.) can be…
sbrand
  • 11
  • 1
1
vote
1 answer

'Observation outside expected value range' error when running example 'traffic_light_grid.py'

I'm trying to learn how to use FLOW. When I was trying to run 'flow/example/rllib/traffic_light_grid.py', it kept returning 'Observation outside expected value range' errors, until it reached the maximum error number. I have no idea why only this…
Ming
  • 11
  • 1
1
vote
0 answers

How to configure batches for LSTM with Marwil in rllib

I'm trying to train an lstm policy using Marwil in rllib. I could not find any examples for how to set up the batches for this problem. I train a marwil model just fine if it does not have an lstm component by using the instructions…
0
votes
0 answers

Error while testing keras_custom_model.py of ray rllib

I'm encountering a persistent issue while working with the Tune library in Ray RLLIB with Python on a Windows system. I'm attempting to run a test script from the library titled: custom_keras_model.py (raylet) [2023-08-22 15:01:41,041 E 10616 15108]…
0
votes
1 answer

Difficulty Implementing DQN for Gym's Taxi-v3 Problem

I've been working on solving the Gym Taxi-v3 problem using reinforcement learning algorithms. Initially, I applied tabular Q-learning and after 10,000 training iterations, the algorithm achieved a mean reward of 8.x with 100% success rate, which was…
Aaron
  • 11
  • 3
0
votes
0 answers

RLLib EpiesodeV2 missing method to query latest dict returned from step?

The Episode class provided the method last_info_for to pull the info dict return to the agent at the latest step. Now EpisodeV2 doesn't have such a method, and subsequently returns errors like 'EpisodeV2' object has no attribute 'last_info_for'. Is…
Victor M
  • 603
  • 4
  • 22
0
votes
0 answers

Setting initial iterations in Ray Tune's implementation of BOHB

I am trying to use Ray Tune's implementation of BOHB to hyperparameter tune a PPO model. If I set the amount of iterations to e.g. 100 it works fine - however it already samples new hyperparameter values after only one iteration of a sample.…
0
votes
0 answers

High Runtime Experiments for Executing Sample Codes in the 'examples' Folder

I am new to Flow and Ray[rllib], and I would like to request your assistance in sharing your estimated runtime experiences for the example codes provided in the 'examples' folder of Flow, considering your system properties. For instance, when I run…
FahimSh87
  • 1
  • 1