1

In the SB3 PPO algorithm, what does the n_steps refer to? Is this the number of steps to run the environment? If so, what if the environment terminates prior to reaching n_steps?

and how does it relate to batch_size?

I am running 12 environments using SubProcVecEnv, but not sure how n_steps and batch_size impact the model while training.

Craig Evans
  • 73
  • 1
  • 9

1 Answers1

1

there is a simple formula, which is always true for on-policy algos in sb:

n_updates = total_timesteps // (n_steps * n_envs)

from that it follows that n_steps is the number of experiences which is collected from a single environment under the current policy before its next update. My subjective basic practice is to set this value to be equal to the episode length, especially if there is a terminal reward.

Then, there is slight terms misusage in sb3. Actually, a batch for PPO is the same as the size of rollout buffer, which is equal to n_steps * n_envs. What is however meant by batch_size is in turn the minibatch size, where your take some subset of your buffer (batch) with random shuffling. Lots of people set batch_size = n_steps, so that your networks consume the whole batch, which may be the case when you have enough video memory and exploit population-based training. However, a standard practice is to use the smaller sized minibatches so that n_steps is divisible by it, like in default parameters for PPO in sb3.

gehirndienst
  • 424
  • 2
  • 13