2

How do you differentiate a matrix in PyTorch? I have tried the following but neither work:

Instance 1:

a = torch.tensor([1., 2, 3], requires_grad=True)
b = torch.tensor([4., 5, 6], requires_grad=True) 
c = a*b
c.backward()
#print(b.grad)

>>> RuntimeError: grad can be implicitly created only for scalar outputs

Instance 2:

a = torch.tensor([1., 2, 3], requires_grad=True)
b = torch.tensor([4., 5, 6], requires_grad=True)

c = a*b
print(b.grad)

>>> None
iacob
  • 20,084
  • 6
  • 92
  • 119
YJH16120
  • 419
  • 1
  • 4
  • 15

1 Answers1

6

It is possible but it doesn't really fit into the standard use case of PyTorch where you are generally interested in the gradient of a scalar valued function.

The derivative of a matrix Y w.r.t. a matrix X can be represented as a Generalized Jacobian. For the case where both matrices are just vectors this reduces to the standard Jacobian matrix, where each row of the Jacobian is the transpose of the gradient of one element of Y with respect to X. More generally if X is shape (n1, n2, ..., nD) and Y is shape (m1, m2, ..., mE) then a natural way to represent the Generalized Jacobian of Y with respect to X is as a tensor of shape (m1, m2, ..., mE, n1, n2, ..., nD).

There are two ways to compute the Generalized Jacobian that I'm aware of in PyTorch.

Option 1

Repeated application of back-propagation on each element of Y.

import torch

def construct_jacobian(y, x, retain_graph=False):
    x_grads = []
    for idx, y_element in enumerate(y.flatten()):
        if x.grad is not None:
            x.grad.zero_()
        # if specified set retain_graph=False on last iteration to clean up
        y_element.backward(retain_graph=retain_graph or idx < y.numel() - 1)
        x_grads.append(x.grad.clone())
    return torch.stack(x_grads).reshape(*y.shape, *x.shape)

then the Jacobian for your test case may be computed using

a = torch.tensor([1., 2., 3.])
b = torch.tensor([4., 5., 6.], requires_grad=True)
c = a * b

jacobian = construct_jacobian(c, b)

print(jacobian)

which results in

tensor([[1., 0., 0.],
        [0., 2., 0.],
        [0., 0., 3.]])

Option 2

In PyTorch 1.5.1 a new autograd.functional API was introduced, including the new function torch.autograd.functional.jacobian. This produces the same results as the previous example but takes a function as an argument. Not demonstrated here, but you can provide the jacobian function a list of inputs if your function takes multiple independent tensors as input. In that case the jacobian would return a tuple containing the Generalized Jacobian for each of the input arguments.

import torch

a = torch.tensor([1., 2., 3.])

def my_fun(b):
    return a * b

b = torch.tensor([4., 5., 6.], requires_grad=True)

jacobian = torch.autograd.functional.jacobian(my_fun, b)

print(jacobian)

which also produces

tensor([[1., 0., 0.],
        [0., 2., 0.],
        [0., 0., 3.]])

As an aside, in some literature the term "gradient" is used to refer to the transpose of the Jacobian matrix. If that's what you're after then, assuming Y and X are vectors, you can simply use the code above and take the transpose of the resulting Jacobian matrix. If Y or X are higher order tensors (matrices or n-dimensional tensors) then I'm not aware of any literature that distinguishes between gradient and Generalized Jacobian. A natural way to represent such a "transpose" of the Generalized Jacobian would be to use Tensor.permute to turn it into a tensor of shape (n1, n2, ..., nD, m1, m2, ..., mE).


As another aside, the concept of the Generalized Jacobian is rarely used in literature (example usage) but is actually relatively useful in practice. This is because it basically works as a bookkeeping technique to keep track of the original dimensionality of Y and X. By this I mean you could just as easily take Y and X and flatten them into vectors, regardless of their original shape. Then the derivative would be a standard Jacobian matrix. Consequently this Jacobian matrix would be equivalent to a reshaped version of the Generalized Jacobian.

jodag
  • 19,885
  • 5
  • 47
  • 66