2

There is a scalar function F with 1000 inputs. I want to train a model to predict F given the inputs. However, in the training dataset, we only know the derivative of F with respect to each input, not the value of F itself. How I can construct a neural network with this limitation in tensorflow or pytorch?

Innat
  • 16,113
  • 6
  • 53
  • 101
Roy
  • 65
  • 2
  • 15
  • 40
  • 1
    Do you have any hints on the function family of F? Would it make sense to train a network to predict its derivative, and then integrate the weights after training? – KonstantinosKokos May 07 '21 at 22:40
  • I can design the network with gradient outputs (i.e. 1000 outputs) and train but how can we integrate them to get the function? – Roy May 07 '21 at 22:49
  • well, assuming you use a linear layer with no activation, and you get a vector w for output `dF(x)/dx[0]`, then `F(x)[0] = w*x + c[0]`, or in matrix form `F(x) = W*x + c` – KonstantinosKokos May 08 '21 at 00:26
  • 1
    but you need to make some assumptions on F I guess – KonstantinosKokos May 08 '21 at 00:31

1 Answers1

2

I think you can use torch.autograd to compute the gradients, and then use them for the loss. You need:

(a) A trainable nn.Module to represent the (unknown) function F:

class UnknownF(nn.Module):
  def __init__(self, ...):
    # whatever combinations of linear layers and activations and whatever...

  def forward(self, x):
    # x is 1000 dim vector
    y = self.layers(x)
    # y is a _scalar_ output
    return y

model = UnknownF(...)  # instansiate the model of the unknown function

(b) Training data:

x = torch.randn(n, 1000, requires_grad=True)  # n examples of 1000-dim vectors
dy = torch.randn(n, 1000)  # the corresponding n-dim gradients of the n inputs

(c) An optimizer:

opt = torch.optim.SGD(model.parameters(), lr=0.1)

(d) Put it together:

criterion = nn.MSELoss()

for e in range(num_epochs):
  for i in range(n):
    # batch size = 1, pick one example
    x_ = x[i, :]
    dy_ = dy[i, :] 
    opt.zero_grad()
    # predict the unknown output
    y_ = model(x_)
    # compute the gradients of the model using autograd:
    pred_dy_ = autograd.grad(y_, x_, create_graph=True)[0]
    # compute the loss between the model's gradients and the GT ones:
    loss = criterion(pred_dy_, dy_)
    loss.backward()
    opt.step()  # update model's parameters accordingly.
Innat
  • 16,113
  • 6
  • 53
  • 101
Shai
  • 111,146
  • 38
  • 238
  • 371
  • Thanks for the nice solution. I wonder if this is limited to batch size of 1 or can be extended to general case. – Roy May 10 '21 at 19:45
  • 1
    @Roy since you can only compute gradients of scalar functuons, you can compute the gradients for one sample at a time. However, yoy can do gradient accumulation to have effectively larger batches – Shai May 10 '21 at 19:48
  • Do you have a link for gradient accumulation, I am not familiar with that. – Roy May 10 '21 at 19:51
  • 1
    @Roy https://discuss.pytorch.org/t/pytorch-gradient-accumulation/55955 – Shai May 10 '21 at 20:00
  • Do you know how to parallelize this? I have posted a question about this, it would be great if you could provide some hints: https://stackoverflow.com/questions/71879164/pytorchs-autograd-issue-with-joblib – Roy Apr 19 '22 at 22:30
  • @Roy I saw your other question, but I am not familiar with the parallel package you are using. – Shai Apr 20 '22 at 06:33