I'm trying to build an MDN to learn P(y | x) where both y and x have dimension D, with K components with full (non-diagonal) covariances. From the output of the hidden layers of the NN I need to construct the component means, weights and covariances. For the covariances, I want a set of lower triangular matrices (i.e. the Cholesky factors of the covariances), i.e. a [K, D, D] tensor, so I can exploit the fact that for positive definite matrices you only need to carry around one triangle of the matrix.
At the moment the NN that parameterizes the means (locs), weights (logs) and covariances (scales), looks like this:
def neural_network(X):
# 2 hidden layers with 15 hidden units
net = tf.layers.dense(X, 15, activation=tf.nn.relu)
net = tf.layers.dense(net, 15, activation=tf.nn.relu)
locs = tf.reshape(tf.layers.dense(net, K*D, activation=None), shape=(K, D))
logits = tf.layers.dense(net, K, activation=None)
scales = # some function of tf.layers.dense(net, K*D*(D+1)/2, activation=None) ?
return locs, scales, logits
The questions is, for the scales, what's the most efficient way of turning tf.layers.dense(net, K*D*(D-1)/2, activation=None)
into a tensor of K DxD lower triangular matrices (with the diagonal elements exponentiated to ensure positive definiteness)?