automatic differentiation vector-Jacobian products in linear time?

Question

I'm new to the inner workings of automatic differentiation and came across some papers and slides that state that vector-Jacobian products can be computed in linear time using automatic differentiation. Specifically written:

$e^\top ( \frac{\partial f}{\partial x} )$

can be computed for any e in O(N). The Jacobian is N-by-N. I would think it's N^2, but I don't fully understand how automatic differentiation can reduce the time complexity.

SO does not sport a latex renderer. The textual form will be understood. — collapsar, Apr 28 '20 at 10:40

score 1 · Answer 1 · answered May 31 '20 at 19:24

I don't fully understand how automatic differentiation can reduce the time complexity.

I'll assume you understand how backpropagation and/or reverse-mode automatic differentiation works for a scalar function y = f(x), where f : R^n -> R. Suppose computing f takes time O(1).

If you remember from backpropagation, you start by setting dy/dy = 1, then computing dy/dx using the chain rule. What you're doing here is you are computing the product of the vector [1.0] with the Jacobian, which is a row vector, to get the vector-Jacobian product which is a column vector. So you are computing the VJP, your vector is just a length-1 vector. You do this in time O(1).

Now suppose that I'm computing the derivative of a multivariate function y = f(x), where f : R^n -> R^m. Then the Jacobian is indeed m x n, and computing the full Jacobian with reverse mode AD would take time O(m). However, you can compute the vector-Jacobian product in time O(1) with a vector v by setting the values of dy_i/dy_i = v_i, for i={1,...,m}. Then you just backpropagate like normal, and get the VJP in a single backwards pass.

score 0 · Answer 2 · answered Apr 30 '20 at 16:35

If you have an underlying assumption that the hidden space is fixed, then it is linear. Suppose that your function f is a composition of matrix transformations where the fixed matrix f1 is H-by-N, f2 is H-by-H, and f3 is N-by-H. Then for N-dimensional vector e, we can compute the computation as f1*e , then apply f2, then apply f3. This scales linearly with N but not with the hidden space H.

automatic differentiation vector-Jacobian products in linear time?

2 Answers2