0

Stable baselines3 PPO implementation using MLP policy uses two hidden layers with 64 nodes each. On setting my gym environment, I had set my action space in the range [-50,50]. However, there seem to be no such bounds on the model output in the PPO MLP policy implementation.

How does one scale the model output to the scale of the action space, especially on stable baselines3?

Manav Mishra
  • 103
  • 1
  • 6

0 Answers0