I'm trying to use tensorflow-probability layers to create a mixture of multivariate normal distributions. When I use IndependentNormal layers for this it works fine, but when I use MultivariateNormalTriL layers I run into a problem with the event_shape. I'm combining these layers with the MixtureSameFamily layer. The following code should illustrate my problem well, and should work in google colab:
import tensorflow as tf
import tensorflow_probability as tfp
import tensorflow.keras as keras
tfpl = tfp.layers
print(tf.__version__)
# >> '1.15.0-rc3'
# but I get the same result with extra warnings in 1.14.0
print(tfp.__version__)
# >> '0.7.0'
print(tfpl.MultivariateNormalTriL(100)(
keras.layers.Input(shape=tfpl.MultivariateNormalTriL.params_size(100))
))
# >> tfp.distributions.MultivariateNormalTriL("multivariate_normal_tri_l_4/MultivariateNormalTriL/MultivariateNormalTriL/",
# batch_shape=[?], event_shape=[100], dtype=float32)
print(tfpl.IndependentNormal((100,))(
keras.layers.Input(shape=(tfpl.IndependentNormal.params_size(100),))
))
# >> tfp.distributions.Independent("Independentindependent_normal_2/IndependentNormal/Normal/",
# batch_shape=[?], event_shape=[100], dtype=float32)
print(tfpl.MixtureSameFamily(16, tfpl.MultivariateNormalTriL(100))(
keras.layers.Input(shape=(16*tfpl.MultivariateNormalTriL.params_size(100),))
))
# >> tfp.distributions.MixtureSameFamily("mixture_same_family_2/MixtureSameFamily/MixtureSameFamily/",
# batch_shape=[?], event_shape=[?], dtype=float32)
print(tfpl.MixtureSameFamily(16, tfpl.IndependentNormal((100,)))(
keras.layers.Input(shape=(16*tfpl.IndependentNormal.params_size(100,),))
))
# >> tfp.distributions.MixtureSameFamily("mixture_same_family_3/MixtureSameFamily/MixtureSameFamily/",
# batch_shape=[?], event_shape=[100], dtype=float32)
Despite the MultivariateNormalTriL and the IndependentNormal having the same batch_shape and event_shape, combining them with MixtureSameFamily results in different event shapes.
So my questions are: why do they lead to different event shapes, and how do I get a layer for a mixture of multivariate normal distributions with different, not necessarily diagonal, covariance matrices and with event_shape=[100]?
Edit: the same happens with tensorflow probability version 0.8