In the document of GPflow like SVGP and natural gradient, the Adam optimizer in TensorFlow is used when it comes to training model parameters (lengthscale, variance, inducing inputs, etc) of the GP model using stochastic variational inference technique, while the natural gradient optimizer for variational parameters. A snippet looks as follows
def run_adam(model, iterations):
"""
Utility function running the Adam optimizer
:param model: GPflow model
:param interations: number of iterations
"""
# Create an Adam Optimizer action
logf = []
train_iter = iter(train_dataset.batch(minibatch_size))
training_loss = model.training_loss_closure(train_iter, compile=True)
optimizer = tf.optimizers.Adam()
@tf.function
def optimization_step():
optimizer.minimize(training_loss, model.trainable_variables)
for step in range(iterations):
optimization_step()
if step % 10 == 0:
elbo = -training_loss().numpy()
logf.append(elbo)
return logf
As demonstrated, model.trainable_variables is passed to the Adam optimizer, which is inherited from tf.Module, and is composed of several parameters including lengthscale and variance.
What I am concerning is whether the Adam optimizer is working on unconstrained or constrained version of the parameters of the model. A snippet of test code runs as follows
import gpflow as gpf
import numpy as np
x = np.arange(10)[:, np.newaxis]
y = np.arange(10)[:, np.newaxis]
model = gpf.models.GPR((x, y),
kernel = gpf.kernels.SquaredExponential(variance = 2, lengthscales = 3),
noise_variance = 4)
model.kernel.parameters[0].unconstrained_variable is model.trainable_variables[0]
and returns
True
As far as I know, parameters of the gaussian process like lenghtscales and the variances of a kernel are nonegative, and they should be constrained when training. I am not an expert of the source code of GPflow or TensorFlow, but it seems that Adam is working on unconstrained parameters. Is this simply a misunderstanding of me, or anything else?
Thanks in advance for any help!