2

I'm trying to build an MDN to learn P(y | x) where both y and x have dimension D, with K components with full (non-diagonal) covariances. From the output of the hidden layers of the NN I need to construct the component means, weights and covariances. For the covariances, I want a set of lower triangular matrices (i.e. the Cholesky factors of the covariances), i.e. a [K, D, D] tensor, so I can exploit the fact that for positive definite matrices you only need to carry around one triangle of the matrix.

At the moment the NN that parameterizes the means (locs), weights (logs) and covariances (scales), looks like this:

def neural_network(X):

  # 2 hidden layers with 15 hidden units
  net = tf.layers.dense(X, 15, activation=tf.nn.relu)
  net = tf.layers.dense(net, 15, activation=tf.nn.relu)
  locs = tf.reshape(tf.layers.dense(net, K*D, activation=None), shape=(K, D))
  logits = tf.layers.dense(net, K, activation=None)
  scales = # some function of tf.layers.dense(net, K*D*(D+1)/2, activation=None) ?

  return locs, scales, logits

The questions is, for the scales, what's the most efficient way of turning tf.layers.dense(net, K*D*(D-1)/2, activation=None) into a tensor of K DxD lower triangular matrices (with the diagonal elements exponentiated to ensure positive definiteness)?

srom
  • 80
  • 9
jalsing
  • 21
  • 2

1 Answers1

3

TL;DR: use tf.contrib.distributions.fill_triangular


Assuming that X is a tensor of K elements of D dimensions, let's define it as a placeholder.

# batch of D-dimensional inputs
X = tf.placeholder(tf.float64, [None, D])

The neural networks is defined just as OP did.

# 2 hidden layers with 15 hidden units
net = tf.layers.dense(X, 15, activation=tf.nn.relu)
net = tf.layers.dense(net, 15, activation=tf.nn.relu)

The means of the multivariate gaussian are simply linear dense layers of the previous hidden layers. The output is of shape (None, D), so no need to multiply the dimension by K and reshaping.

# Parametrisation of the means
locs = tf.layers.dense(net, D, activation=None)

Next, we define the lower-triangular covariance matrix. The key is to use tf.contrib.distributions.fill_triangular on the output of another linear dense layer.

# Parametrisation of the lower-triangular covariance matrix
covariance_weights = tf.layers.dense(net, D*(D+1)/2, activation=None)
lower_triangle = tf.contrib.distributions.fill_triangular(covariance_weights)

One last thing: we need to ensure that the covariance matrix is positive semi-definite. It is easily achieved by applying the softplus activation function to the diagonal elements.

# Diagonal elements must be positive
diag = tf.matrix_diag_part(lower_triangle)
diag_positive = tf.layers.dense(diag, D, activation=tf.nn.softplus)
covariance_matrix = lower_triangle - tf.matrix_diag(diag) + tf.matrix_diag(diag_positive)

That's it, we've parameterised a multivariate normal distribution using a neural network.


Bonus: Trainable multivariate normal distribution

The Tensorflow Probability package has a trainable multivariate normal distribution with lower triangular covariance matrix readily available: tfp.trainable_distributions.multivariate_normal_tril

It can be used as follow:

mvn = tfp.trainable_distributions.multivariate_normal_tril(net, D)

It outputs a multivariate normal triangular distribution with the same methods as tfp.distributions.MultivariateNormalTriL, including mean, covariance, sample, etc.

I'd recommend using it instead of building your own.

srom
  • 80
  • 9