This is my first custom RL stable.baselines3 project so feel free to ask for more information. It's a Snake game.
I want to create 4 environments which train the model simultaneously and see all 4 outputs. Ideally in a 2x2 grid in one pygame window.
This is the code of the game:
class SnakeGame():
def __init__(self) -> None:
pygame.init()
self.width = 720
self.height = 480
self.window = pygame.display.set_mode((self.width, self.height))
# ... init additional game stuff
def main(self, direction = None, render=True):
# ... do game stuff
if render: # renders if a real human plays the game
self.render()
return reward, False
def render(self, mode='human'):
if mode == 'human':
self.window.fill(self.BLACK)
# ... render objects
pygame.display.update()
This is the code of my custom env:
class SnakeENV(Env):
def __init__(self) -> None:
super(SnakeENV, self).__init__()
metadata = {'render.modes': ['human',]}
self.game = SnakeGame()
self.action_space = Discrete(4)
self.observation_space = ...
def step(self, action):
direction = self.action_to_direction(action)
reward, done = self.game.main(direction, render=False)
info = {}
return self.state, reward, done, info
def render(self, mode='human'):
self.game.render(mode=mode)
I create and train the model like this:
from stable_baselines3.common.env_util import make_vec_env
env = make_vec_env(SnakeENV, n_envs=4)
model = PPO("MultiInputPolicy", env, verbose=1, tensorboard_log=log_path)
model.learn(total_timesteps=100000)
It seems like all 4 environments get rendered in the same window. So like:
- env_no 1; frame 1
- env_no 2; frame 1
- env_no 3; frame 1
- env_no 4; frame 1
- env_no 1; frame 2
- env_no 2; frame 2
- ...
I tried to create multiple instances of my SnakeGame in one SnakeENV and assign each one one to one place in a 2x2 grid in a new window. This could be the right path, but I failed to implement that, because now I had to deal with multiple environments at once in my render and step method of my SnakeENV class. I really want to know if there is an easier solution I just have been missing: So creating 4 SnakeENV instances and not 4 SnakeGame instances inside one SnakeENV.