In the tensorflow documentation for TF-Agents Environments there is an example of an environment for a simple (blackjack inspired) card game.
The init looks like the following:
class CardGameEnv(py_environment.PyEnvironment):
def __init__(self):
self._action_spec = array_spec.BoundedArraySpec(
shape=(), dtype=np.int32, minimum=0, maximum=1, name='action')
self._observation_spec = array_spec.BoundedArraySpec(
shape=(1,), dtype=np.int32, minimum=0, name='observation')
self._state = 0
self._episode_ended = False
The action spec allows only for 0 (do not ask for a card) or 1 (ask for a card), and so it's sensible that the shape is shape=()
(just needs an integer).
However I don't quite understand the observation spec shape being shape=(1,)
, given that it will just represent the sum of the cards in the current round (so also an integer).
What explains the difference in shapes?