I'm working to create an AI player for a simple game. I'm currently using the stable-baselines3 and gym libraries. I'm having a bit of a hard time figuring out how to create an observation space to represent the data about the current game state. As described here:
I'm using a spaces Dict to model the game objects. I have a 'card' with 4 properties (level, cost, suit, and points) and a 'bank' representing the quantities of 6 different gems.
The problem now is how to model COLLECTIONS of objects. If we take the example of 'affordable' cards I can represent the 4 properties as key value pairs, but there are between 0 and 12 individual card objects that need to be represented so that the agent can choose among them. Even if I was willing to have an individual key-value pair for every property of every card, the number of cards is not known in advance. At the same time, stable-baselines doesn't support nested Dict/tuples. I'm at a bit of a loss here.
Any suggestions?
FYI here is the observation space I have so far:
self.observation_space = gym.spaces.Dict(
spaces={
AFFORDABLE_LEVEL: gym.spaces.Discrete(3, 1),
AFFORDABLE_COST: gym.spaces.Box(low=0, high=7, shape=(5,), dtype=np.int32),
AFFORDABLE_SUIT: gym.spaces.Discrete(6, 1),
AFFORDABLE_POINTS: gym.spaces.Discrete(5, 0),
AVAILABLE_LEVEL: gym.spaces.Discrete(3, 1),
AVAILABLE_COST: gym.spaces.Box(low=0, high=7, shape=(5,), dtype=np.int32),
AVAILABLE_SUIT: gym.spaces.Discrete(6, 1),
AVAILABLE_POINTS: gym.spaces.Discrete(5, 0),
OWNED_LEVEL: gym.spaces.Discrete(3, 1),
OWNED_COST: gym.spaces.Box(low=0, high=7, shape=(5,), dtype=np.int32),
OWNED_SUIT: gym.spaces.Discrete(6, 1),
OWNED_POINTS: gym.spaces.Discrete(5, 0),
RESERVED_LEVEL: gym.spaces.Discrete(3, 1),
RESERVED_COST: gym.spaces.Box(low=0, high=7, shape=(5,), dtype=np.int32),
RESERVED_SUIT: gym.spaces.Discrete(6, 1),
RESERVED_POINTS: gym.spaces.Discrete(5, 0),
PLAYER_GEMS: gym.spaces.Box(low=0, high=7, shape=(5,), dtype=np.int32),
BANK_GEMS: gym.spaces.Box(low=0, high=7, shape=(5,), dtype=np.int32)
})