There is a scalar function F
with 1000 inputs. I want to train a model to predict F
given the inputs. However, in the training dataset, we only know the derivative of F
with respect to each input, not the value of F
itself. How I can construct a neural network with this limitation in tensorflow
or pytorch
?
Asked
Active
Viewed 192 times
2
-
1Do you have any hints on the function family of F? Would it make sense to train a network to predict its derivative, and then integrate the weights after training? – KonstantinosKokos May 07 '21 at 22:40
-
I can design the network with gradient outputs (i.e. 1000 outputs) and train but how can we integrate them to get the function? – Roy May 07 '21 at 22:49
-
well, assuming you use a linear layer with no activation, and you get a vector w for output `dF(x)/dx[0]`, then `F(x)[0] = w*x + c[0]`, or in matrix form `F(x) = W*x + c` – KonstantinosKokos May 08 '21 at 00:26
-
1but you need to make some assumptions on F I guess – KonstantinosKokos May 08 '21 at 00:31
1 Answers
2
I think you can use torch.autograd
to compute the gradients, and then use them for the loss. You need:
(a) A trainable nn.Module
to represent the (unknown) function F
:
class UnknownF(nn.Module):
def __init__(self, ...):
# whatever combinations of linear layers and activations and whatever...
def forward(self, x):
# x is 1000 dim vector
y = self.layers(x)
# y is a _scalar_ output
return y
model = UnknownF(...) # instansiate the model of the unknown function
(b) Training data:
x = torch.randn(n, 1000, requires_grad=True) # n examples of 1000-dim vectors
dy = torch.randn(n, 1000) # the corresponding n-dim gradients of the n inputs
(c) An optimizer:
opt = torch.optim.SGD(model.parameters(), lr=0.1)
(d) Put it together:
criterion = nn.MSELoss()
for e in range(num_epochs):
for i in range(n):
# batch size = 1, pick one example
x_ = x[i, :]
dy_ = dy[i, :]
opt.zero_grad()
# predict the unknown output
y_ = model(x_)
# compute the gradients of the model using autograd:
pred_dy_ = autograd.grad(y_, x_, create_graph=True)[0]
# compute the loss between the model's gradients and the GT ones:
loss = criterion(pred_dy_, dy_)
loss.backward()
opt.step() # update model's parameters accordingly.
-
Thanks for the nice solution. I wonder if this is limited to batch size of 1 or can be extended to general case. – Roy May 10 '21 at 19:45
-
1@Roy since you can only compute gradients of scalar functuons, you can compute the gradients for one sample at a time. However, yoy can do gradient accumulation to have effectively larger batches – Shai May 10 '21 at 19:48
-
Do you have a link for gradient accumulation, I am not familiar with that. – Roy May 10 '21 at 19:51
-
1
-
Do you know how to parallelize this? I have posted a question about this, it would be great if you could provide some hints: https://stackoverflow.com/questions/71879164/pytorchs-autograd-issue-with-joblib – Roy Apr 19 '22 at 22:30
-
@Roy I saw your other question, but I am not familiar with the parallel package you are using. – Shai Apr 20 '22 at 06:33