Train a neural network when the training has only the derivative of output wrt all inputs

Question

There is a scalar function F with 1000 inputs. I want to train a model to predict F given the inputs. However, in the training dataset, we only know the derivative of F with respect to each input, not the value of F itself. How I can construct a neural network with this limitation in tensorflow or pytorch?

Do you have any hints on the function family of F? Would it make sense to train a network to predict its derivative, and then integrate the weights after training? — KonstantinosKokos, May 07 '21 at 22:40
I can design the network with gradient outputs (i.e. 1000 outputs) and train but how can we integrate them to get the function? — Roy, May 07 '21 at 22:49
well, assuming you use a linear layer with no activation, and you get a vector w for output `dF(x)/dx[0]`, then `F(x)[0] = w*x + c[0]`, or in matrix form `F(x) = W*x + c` — KonstantinosKokos, May 08 '21 at 00:26

score 2 · Accepted Answer · edited May 11 '21 at 13:34

2

I think you can use torch.autograd to compute the gradients, and then use them for the loss. You need:

(a) A trainable nn.Module to represent the (unknown) function F:

class UnknownF(nn.Module):
  def __init__(self, ...):
    # whatever combinations of linear layers and activations and whatever...

  def forward(self, x):
    # x is 1000 dim vector
    y = self.layers(x)
    # y is a _scalar_ output
    return y

model = UnknownF(...)  # instansiate the model of the unknown function

(b) Training data:

x = torch.randn(n, 1000, requires_grad=True)  # n examples of 1000-dim vectors
dy = torch.randn(n, 1000)  # the corresponding n-dim gradients of the n inputs

(c) An optimizer:

opt = torch.optim.SGD(model.parameters(), lr=0.1)

(d) Put it together:

criterion = nn.MSELoss()

for e in range(num_epochs):
  for i in range(n):
    # batch size = 1, pick one example
    x_ = x[i, :]
    dy_ = dy[i, :] 
    opt.zero_grad()
    # predict the unknown output
    y_ = model(x_)
    # compute the gradients of the model using autograd:
    pred_dy_ = autograd.grad(y_, x_, create_graph=True)[0]
    # compute the loss between the model's gradients and the GT ones:
    loss = criterion(pred_dy_, dy_)
    loss.backward()
    opt.step()  # update model's parameters accordingly.

edited May 11 '21 at 13:34

Innat

16,113
6
53
101

answered May 10 '21 at 13:14

Shai

111,146
38
238
371

Thanks for the nice solution. I wonder if this is limited to batch size of 1 or can be extended to general case. – Roy May 10 '21 at 19:45
1

@Roy since you can only compute gradients of scalar functuons, you can compute the gradients for one sample at a time. However, yoy can do gradient accumulation to have effectively larger batches – Shai May 10 '21 at 19:48
Do you have a link for gradient accumulation, I am not familiar with that. – Roy May 10 '21 at 19:51
1

@Roy https://discuss.pytorch.org/t/pytorch-gradient-accumulation/55955 – Shai May 10 '21 at 20:00
Do you know how to parallelize this? I have posted a question about this, it would be great if you could provide some hints: https://stackoverflow.com/questions/71879164/pytorchs-autograd-issue-with-joblib – Roy Apr 19 '22 at 22:30
@Roy I saw your other question, but I am not familiar with the parallel package you are using. – Shai Apr 20 '22 at 06:33

Train a neural network when the training has only the derivative of output wrt all inputs

1 Answers1

Linked