Correct use of a2c.A2cTrainer Rllib with gym and pettingzoo

Question

I'm building a speaker listener training environment with rllib from this article. Using pettingzoo and supersuit.

I've encountered the following error:

NotImplementedError: Cannot convert a symbolic Tensor (default_policy/cond/strided_slice:0) to a numpy array

When trying to run my code, but as I lack experience with these packages I do not understand if the problem is within my code or the use of the packages as they're supposedly enough to work with rllib. I'm attaching my code in the end, here's the problematic line:

agent = a2c.A2CTrainer(env="simple_speaker_listener", config=config)

I believe I'm close to making it work, here's the rest of the code:

import numpy as np
import supersuit
from copy import deepcopy
from ray.rllib.env import PettingZooEnv
import ray.rllib.agents.a3c.a2c as a2c
import ray
from ray.tune.registry import register_env
from ray.rllib.env import BaseEnv
from pettingzoo.mpe import simple_speaker_listener_v3

alg_name = "PPO"
config = deepcopy(a2c.A2C_DEFAULT_CONFIG)
config["env_config"] = None
config["rollout_fragment_length"] = 20
config["num_workers"] = 5
config["num_envs_per_worker"] = 1
config["lr_schedule"] = [[0, 0.007], [20000000, 0.0000000001]]
config["clip_rewards"] = True
s = "{:3d} reward {:6.2f}/{:6.2f}/{:6.2f} len {:6.2f}"
multiagent_dict = dict()
multiagent_policies = dict()
env = simple_speaker_listener_v3.env()
agents_name = deepcopy(env.possible_agents)
config = {
          "num_gpus": 0,
          "num_workers": 1,
          }
env = simple_speaker_listener_v3.env()
mod_env = supersuit.aec_wrappers.pad_action_space(env)
mod_env = supersuit.aec_wrappers.pad_observations(mod_env)
mod_env = PettingZooEnv(mod_env)
register_env("simple_speaker_listener", lambda stam: mod_env)

ray.init(num_gpus=0, local_mode=True)
agent = a2c.A2CTrainer(env="simple_speaker_listener", config=config)

for it in range(5):
    result = agent.train()
    print(s.format(
        it + 1,
        result["episode_reward_min"],
        result["episode_reward_mean"],
        result["episode_reward_max"],
        result["episode_len_mean"]
    ))
    mod_env.reset()

Did you get it to work? I am also trying to get it work, but am unable to. @user13399343 — Gledi, May 18 '21 at 16:43
Here's what worked for me in the end: github.com/guyna25/speakerListnerRLlibEnv/blob/master/testing.py I also recommend reading the doc in the code itself in petting zoo as it's more up to date than others — user13399343, May 25 '21 at 10:00

Correct use of a2c.A2cTrainer Rllib with gym and pettingzoo

0 Answers0