0

Instead of writing a PyEnvironment and batching it (using BatchedPyEnvironment), I would like to write a PyEnvironment that is directly in a batched format.

This means my observation should be of shape (batch_size, ..), and my actions, discounts and rewards should be of length batch_size. This would speed up parallelization for my use-case as I can vectorize the evolvement of the state.

Is this possible in tf_agents, and can agents work with such an environment (after converting it into a TFEnvironment)? If so, are there any examples how this can be achieved?

I tried defining the action_spec and observation_spec accordingly, but I don't know how to adjust discount and reward shapes and if batching of individual elements within a time-step does make sense.

self._action_spec = array_spec.BoundedArraySpec( shape=(self._batch_size,), dtype=np.int32, minimum=0, maximum=1, name='action')

self._observation_spec = array_spec.BoundedArraySpec( shape=(self._batch_size, 1), dtype=np.int32, minimum=0, name='observation')

Thank you!

Henrik
  • 1

0 Answers0