0

I am new to RLlib and trying to write a small program that takes a configuration file and trains an agent. The configuration file is a fine-tuned example for CartPole-v1 environment, and I saved it in cartpole-ppo.yaml.

I am aware of RLlib CLI using Python API, but I want to write a Python script that takes the configuration file as an input and trains an agent. I tried multiple ways to do this, but I couldn't get it to work.

The configuration file is fine-tuned example

cartpole-ppo:
    env: CartPole-v1
    run: PPO
    stop:
        episode_reward_mean: 150
        timesteps_total: 100000
    config:
        # Works for both torch and tf.
        framework: torch
        gamma: 0.99
        lr: 0.0003
        num_workers: 1
        observation_filter: MeanStdFilter
        num_sgd_iter: 6
        vf_loss_coeff: 0.01
        model:
            fcnet_hiddens: [32]
            fcnet_activation: linear
            vf_share_layers: true
        enable_connectors: true

I saved it in cartpole-ppo.yaml and now trying to write a main.py that takes this configuration file and runs as expected.

import ray
import yaml
import gymnasium as gym
.......................

def train(config_file):
    with open(config_file, "r") as f:
        config = yaml.safe_load(f)
.................

    return analysis

if __name__ == "__main__":
    ray.init()
    config_file = 'cartpole-ppo.yaml'
    train(config_file)

I want to fill the gaps, I tried many ways, but futile. Please suggest a way to achieve this.

hanugm
  • 1,127
  • 3
  • 13
  • 44

1 Answers1

0

Example .yaml file:

########################
# Ray Algorithm config
########################
resources:
  num_gpus: 1
  num_gpus_per_worker: 0
  num_cpus_per_worker: 1
  num_learner_workers: 1
  num_cpus_per_learner_worker: 1
  num_gpus_per_learner_worker: 0
  _fake_gpus: False
environment:
  env: 'Cartpole-v1'
rollouts:
  num_envs_per_worker: 1
  num_rollout_workers: 16
evaluation:
  evaluation_interval: 1                  # After how many training iterations the evaluation is done
  evaluation_duration_unit: "episodes"    # How the length of an evaluation is measured
  evaluation_duration: 5                  # Duration for which to run the evaluation
  evaluation_num_workers: 0               # If 0, evaluation is done on the local worker
  evaluation_config:
    explore: False
exploration:
  explore: True
fault_tolerance:
  restart_failed_sub_environments: true
rl_module:
  _enable_rl_module_api: False


training:
  _enable_learner_api: False
  # Training params
  train_batch_size: &train_bs 32
  sgd_minibatch_size: 32
  lambda_: 0.1
  gamma: 0.95
  lr: 0.0003
  vf_clip_param: 10.0
  model:
    fcnet_activation: "relu"

In your python script:

train_config = yaml.safe_load(open(..., 'r'))
algo_config = PPO.PPOConfig()
algo_config.resources(**train_config["resources"])                         \
            .rollouts(**train_config["rollouts"])                                   \
            .training(**train_config["training"])                                   \
            .exploration(**train_config["exploration"])                             \
            .reporting(**train_config["reporting"])                                 \
            .evaluation(**train_config['evaluation'])                               \
            .environment(**train_config['environment'])                               \
            .framework(framework="torch")                                           \
            .debugging(seed=42)                                                     \
            .multi_agent(**train_config["multiagent"])                               \
            .fault_tolerance(**train_config["fault_tolerance"])

Then:

trainer = PPO(config=algo_config)
for step_idx in tqdm(range(num_train_iters)):
    result = trainer.train()