Best practice to set Drake's simulator for fixed integration when using with reinforcement learning?

Question

I'm using drake for some model-free reinforcement learning and I noticed that Drake uses a non-fixed step integration when simulating an update. This makes sense for the sake of integrating multiple times over a smaller duration when the accelerations of a body is large, but in the case of using reinforcement learning this results in some significant compute overhead and slow rollouts. I was wondering if there is a principled way to allow the simulation environment to operate in a fixed timestep integration mode beyond the method that I'm currently using (code below). I'm using the PyDrake bindings, and PPO as the RL algorithm currently.

integrator = simulator.get_mutable_integrator()
integrator.set_fixed_step_mode(True)

score 2 · Accepted Answer · answered Oct 21 '21 at 00:01

On way to change the integrator that is used for continuous-time dynamics is to call ResetIntegratorFromFlags. For example, to use the RungeKutta2Integrator you would call:

ResetIntegratorFromFlags(simulator=simulator, scheme="runge_kutta2", max_step_size=0.01)

The other thing to keep in mind is whether the System(s) you are simulating use continuous- or discrete-time dynamics, and whether that is configurable in those particular System(s). If there are no continuous-time dynamics being simulated, then the choice of integrator does not matter. Only the update period(s) of the discrete systems will matter.

In particular, if you are simulating a MultibodyPlant, it takes a time_step argument to its constructor. When zero, it will use continue-time dynamics; when greater than zero, it will use discrete-time dynamics.

When I've used Drake for RL, I've almost always put the MultibodyPlant into discrete mode. Staring with time_step=0.001 is usually a safe choice. You might be able to use a larger step depending on the bodies and properties in the scene.

score 1 · Answer 2 · answered Oct 23 '21 at 21:11

I agree with @jwnimmer-tri -- I suspect for your use case you want to put the MultibodyPlant into discrete mode by specifying the time_step in the constructor.

And to the higher-level question -- I do think it is better to use fixed-step integration in RL. The variable-step integration is more accurate for any one rollout, but could introduce (small) artificial non-smoothness as your algorithm changes the parameters or initial conditions.

Best practice to set Drake's simulator for fixed integration when using with reinforcement learning?

2 Answers2