I am looking for some guidance to building a multi agent dummy example. I've been trying to work through Rllib documentation , but I think I haven't understood the approach of how to create my own multi-agent environment.
I'd like to have several agents begin from different, random initial position x. The dynamics governing each agent should governed by an differential equation like:
derivative(x) = x + agent_action + noise
The goal is for the agents to learn actions that will ultimately cause the x values of all agents to converge to one value.
Can I use the code stub provided in multi_agent_env.py to implement my own MA environment?
For instance I create my own file MADummyEnv.py with
from ray.rllib.utils.annotations import PublicAPI
from ray.rllib.env.multi_agent_env import MultiAgentEnv
from ray.rllib.utils.typing import MultiAgentDict, AgentID
class MADummyEnv(MultiAgentEnv):
ray.init()
env = MyMultiAgentEnv()
obs = env.reset()
print(obs)
new_obs, rewards, dones, infos = env.step(actions={"agent1": np.random.choice(action_list)}
and then implement the init, step, reset methods inside the MADummyEnv class. Is this correct?
Perhaps someone can point me into the general direction if I have the right idea or ideally provide an implementation of some custom multi agent environment for reference.