0

Im trying to solve the Yatzee game once and forever using reinforcement learning. Sadly when i check the gyms conformity with stable baselines, it is critisizing the shape of my observation space. So ive put a print statement in the constructor thats telling me the shape of my observation space, as soon as i create an object.


class YatzeeEnv{
    game_state = np.zeros(19, np.int32)

    def __init__(self):
        self.action_space = gym.spaces.Discrete(19)
        self.observation_space = gym.spaces.MultiDiscrete(19)

        for x in self.game_state_adresses:
            self.game_state[x] = -1
        self.reroll()
        self.game_state[self.reroll_state] = 0
        print(self.game_state.shape)
        print(self.observation_space.shape)
}

a = YatzeeEnv()

Sadly the output of this is

np array shape: (19,)
Observation space shape: ()

Why is this? I thought gym.spaces.MultiDiscrete(19) defines the observation space as int array with 19 values.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
noMatt
  • 11
  • 2

1 Answers1

0

From the docs...

This represents the cartesian product of arbitrary :class:`Discrete` spaces.
    It is useful to represent game controllers or keyboards where each key can be represented as a discrete action space.
    Note:
        Some environment wrappers assume a value of 0 always represents the NOOP action.
    e.g. Nintendo Game Controller - Can be conceptualized as 3 discrete action spaces:
    1. Arrow Keys: Discrete 5  - NOOP[0], UP[1], RIGHT[2], DOWN[3], LEFT[4]  - params: min: 0, max: 4
    2. Button A:   Discrete 2  - NOOP[0], Pressed[1] - params: min: 0, max: 1
    3. Button B:   Discrete 2  - NOOP[0], Pressed[1] - params: min: 0, max: 1
    It can be initialized as ``MultiDiscrete([ 5, 2, 2 ])`` such that a sample might be ``array([3, 1, 0])``.
    Although this feature is rarely used, :class:`MultiDiscrete` spaces may also have several axes
    if ``nvec`` has several axes:
    Example::
        >> d = MultiDiscrete(np.array([[1, 2], [3, 4]]))
        >> d.sample()
        array([[0, 0],
               [2, 3]])

If you have only one action space, you dont have to use MultiDiscrete. Or use MultiDiscrete([19]).

Bhupen
  • 1,270
  • 1
  • 12
  • 27
  • Thank you very much for your answer:) sadly applying this, leads to following outout: `def __init__(self): self.action_space = gym.spaces.Discrete(19) self.observation_space = gym.spaces.MultiDiscrete([19]) for x in self.game_state_adresses: self.game_state[x] = -1 self.reroll() self.game_state[self.reroll_state] = 0 print("np array shape: " + str(self.game_state.shape)) print("Observation space shape: " + str(self.observation_space.shape))` Output np array shape: (19,) Observation space shape: (1,) – noMatt Oct 08 '22 at 17:07
  • 'import gym import numpy as np import pandas as pd from stable_baselines3.common.env_checker import check_env – noMatt Oct 08 '22 at 18:08