Instead of writing a PyEnvironment and batching it (using BatchedPyEnvironment), I would like to write a PyEnvironment that is directly in a batched format.
This means my observation should be of shape (batch_size, ..), and my actions, discounts and rewards should be of length batch_size. This would speed up parallelization for my use-case as I can vectorize the evolvement of the state.
Is this possible in tf_agents, and can agents work with such an environment (after converting it into a TFEnvironment)? If so, are there any examples how this can be achieved?
I tried defining the action_spec and observation_spec accordingly, but I don't know how to adjust discount and reward shapes and if batching of individual elements within a time-step does make sense.
self._action_spec = array_spec.BoundedArraySpec( shape=(self._batch_size,), dtype=np.int32, minimum=0, maximum=1, name='action')
self._observation_spec = array_spec.BoundedArraySpec( shape=(self._batch_size, 1), dtype=np.int32, minimum=0, name='observation')
Thank you!