Direct Transcription of nonlinear system with cost function dependent on K matrices returned by time-varying LQR

Question

I'm working on implementing a trajectory optimization algorithm named DIRTREL, which is essentially direct transcription with an added cost function. However, the cost function incorporates the K matrices obtained by linearizing the system around the decision variables (x, u) and employing discrete time-varying LQR. My question is how to most efficiently and concisely express this in drake as my current approach describes the system symbolically and results in extremely lengthly symbolic equations (which will only increase in length with more timesteps) due to the recursive nature of the Riccati difference equation, and if this symbolic approach is even appropriate.

For more details:

Specify my system as a LeafSystem
Declare a MathematicalProgram with decision variables x, u
To obtain time-varying linearized dynamics, specify a class that takes in the dynamics and decision variables at a single timestep and returns Jacobians for that timestep with symbolic.Jacobian(args)
Add cost function which takes in the entire trajectory, so all x, u

Inside the cost function:

Obtain linearized matrices A_i, B_i, G_i (G_i for noise) for each timestep by using the class that takes in decision variables and returns Jacobians
Compute the TVLQR cost (S[n]) with the Riccati difference equations employing the A_i and B_i matrices and solving for Ks
return a cost for the mathematical program that is essentially a large linear combination of the K matrices

One side note is I am not sure of the most tractable way to compute an inverse symbolically, but I am most concerned with my methodology and whether this symbolic description is appropriate.

Hongkai Dai · Accepted Answer · 2020-07-13T05:59:56.280

I think there are several details on DIRTREL worth discussion:

The cost-to-go matrix S[n] depends on the linearized dynamics Ai, Bi. I think in DIRTREL you will need to solve a nonlinear optimization problem, which requires the gradient of the cost. So to compute the gradient of of your cost, you will need the gradient of S[n], which requires the gradient of Ai, Bi. Since Ai and Bi are gradient of the dynamics function f(x, u), you will need to compute the second order gradient of the dynamics.
We had a paper on doing trajectory optimization and optimizing the cost function related to the LQR cost-to-go. DIRTREL made several improvement upon our paper. In our implementation, we treated S also as a decision variable, so our decision variables are x, u, S, with the constraint include both the dynamics constraint x[n+1] = f(x[n], u[n]), and the Riccati equation as constraint on S. I think DIRTREL's approach scales better with less decision variables, but I haven't compared the numerical performance between the two approaches.
I am not sure why you need to compute the inverse symbolically. First what is the inverse you need to compute? And second, Drake supports using automatic differentiation to compute the gradient in the numerical value. I would recommend doing numerical computation instead of symbolic computation. Since in numerical optimization, you only need the value and gradient of the cost/constraints, it is usually much more efficient to compute these values numerically, rather than first deriving the symbolic expression, and then evaluating the symbolic expression.

Just read DIRTREL's paper again, now I realize that the cost-to-go matrix `S` is not a part of the decision variable in DIRTREL, which significantly reduces the size of the optimization problem. Still I think you will need the second order gradient of the dynamics function, to compute the gradient of the added LQR cost. — Hongkai Dai, Jul 13 '20 at 04:21
Thank you! I may add a separate post for questions I have about using the automatic differentiation workflow, which I am not fully familiar with (for questions such as to what extent is it automatic in drake, is there any initialization I need to get the Jacobians I am looking for, etc). In the documentation I notice a few functions that may potentially provide linearized matrices A_i, B_i, G_i for each timestep of my nonlinear system (perhaps autodiffutils.AutoDiffXd.derivatives or forwarddiff.jacobian). Is there an example I can learn the workflow from? — Phil, Jul 13 '20 at 16:13

Direct Transcription of nonlinear system with cost function dependent on K matrices returned by time-varying LQR

1 Answers1

Linked