5

I have a simple model that currently outputs a single numerical value which I've adapted to instead output a distribution using TFP (mean + std deviation) so I can instead understand the model's confidence around the prediction.

  model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, input_shape=[len(df.columns),], activation='relu'), # Should only be one input, so [1,]
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(2 * len(target.columns)), # there are 2 outputs, so we want a mean + standard deviation for EACH of the outputs
    tfp.layers.DistributionLambda(
      lambda t: tfd.Normal(loc=t[..., :1],
                           scale=1e-3 + tf.math.softplus(0.05 * t[...,1:]))
    )
  ])

The current 2 Dense outputs point to the mean + standard deviation of the output distribution.

In my real dataset, I have two numerical values I attempt to predict based on input data. How do I make a model output two distributions? I think the final Dense layer would need to be 4 nodes (2 means and 2 standard deviations), but I'm not sure how to make this properly work with the Distribution Lambda. I'm hoping to have a single model that predicts this rather than having to train one model per target output.

EDIT: I created this collab for people to see what I'm getting at a little more easily. I simplified the example a little bit more and hopefully, it's more self-explanatory what I'm trying to accomplish:

https://colab.research.google.com/drive/1Wlucked4V0z-Bm_ql8XJnOJL0Gm4EwnE?usp=sharing

Innat
  • 16,113
  • 6
  • 53
  • 101
Deftness
  • 265
  • 1
  • 4
  • 21

2 Answers2

1

Check out this guide on shapes in TFP: https://www.tensorflow.org/probability/examples/Understanding_TensorFlow_Distributions_Shapes

IIUC you'll want to output a distribution with batch_shape = [2]. This is effectively 2 distributions of the same family, with different parameters. Computations done with this batch of distributions (samples, pdf/log_pdf evaluations) will be vectorized (run in parallel).

Chris Suter
  • 1,338
  • 2
  • 9
  • 9
  • Hey, thanks for replying! I tried inputting the normal layer as a list of distributions instead, but I still can't seem to figure it out. I created this collab also if it's easier to see what I mean: https://colab.research.google.com/drive/1Wlucked4V0z-Bm_ql8XJnOJL0Gm4EwnE?usp=sharing – Deftness Mar 06 '22 at 22:07
  • Your penultimate layer has 2 * len(target.columns) = 4 outputs. As written, you are dividing this into 2 pieces of len 1 and 3, resp. --- this is the [..., :1] (implicitly [..., 0:1] == len 1) and [..., 1:] (implicitly [..., 1:4] == len 3). I modified your colab to make these both len 2 ([..., :2] and [..., 2:]) and it runs fine! – Chris Suter Mar 07 '22 at 16:23
1

IIUC and assuming you want to leave your tfp.layers.DistributionLambda as it is, you have a few options, which can you experiment with:

Option 1: Use two Dense layers with the Keras functional API:

# Your code
#[.....]

tfd = tfp.distributions
sample_layer = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
                           scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))
def get_df_model():
  inputs = tf.keras.layers.Input(shape=[len(df.columns),])
  x = tf.keras.layers.Dense(10, activation='relu')(inputs)
  x = tf.keras.layers.Dense(10, activation='relu')(x)
  outputs1 = tf.keras.layers.Dense(len(target.columns))(x)
  outputs2 = tf.keras.layers.Dense(len(target.columns))(x) # there are 2 outputs, so we want a mean + standard deviation for EACH of the outputs
    
  outputs1 = sample_layer(outputs1)
  outputs2 = sample_layer(outputs2)
  model = tf.keras.Model(inputs, [outputs1, outputs2])

  negloglik = lambda y, rv_y: -rv_y.log_prob(y)

  model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
  return model


model = get_df_model()
model.summary()
model.fit(df, target, epochs=10)
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 1)]          0           []                               
                                                                                                  
 dense_24 (Dense)               (None, 10)           20          ['input_1[0][0]']                
                                                                                                  
 dense_25 (Dense)               (None, 10)           110         ['dense_24[0][0]']               
                                                                                                  
 dense_26 (Dense)               (None, 2)            22          ['dense_25[0][0]']               
                                                                                                  
 dense_27 (Dense)               (None, 2)            22          ['dense_25[0][0]']               
                                                                                                  
 distribution_lambda_10 (Distri  ((None, 1),         0           ['dense_26[0][0]',               
 butionLambda)                   (None, 1))                       'dense_27[0][0]']               
                                                                                                  
==================================================================================================
Total params: 174
Trainable params: 174
Non-trainable params: 0
__________________________________________________________________________________________________
Epoch 1/10
157/157 [==============================] - 1s 2ms/step - loss: 522.2677 - distribution_lambda_10_loss: 247.8716 - distribution_lambda_10_1_loss: 274.3961
Epoch 2/10
157/157 [==============================] - 1s 3ms/step - loss: 20.3496 - distribution_lambda_10_loss: 9.5429 - distribution_lambda_10_1_loss: 10.8067
Epoch 3/10
157/157 [==============================] - 1s 6ms/step - loss: 13.7444 - distribution_lambda_10_loss: 6.6085 - distribution_lambda_10_1_loss: 7.1359
Epoch 4/10
157/157 [==============================] - 1s 7ms/step - loss: 11.3713 - distribution_lambda_10_loss: 5.5506 - distribution_lambda_10_1_loss: 5.8206
Epoch 5/10
157/157 [==============================] - 1s 4ms/step - loss: 10.2081 - distribution_lambda_10_loss: 5.0250 - distribution_lambda_10_1_loss: 5.1830
Epoch 6/10
157/157 [==============================] - 0s 3ms/step - loss: 9.5528 - distribution_lambda_10_loss: 4.7256 - distribution_lambda_10_1_loss: 4.8272
Epoch 7/10
157/157 [==============================] - 0s 2ms/step - loss: 9.1495 - distribution_lambda_10_loss: 4.5393 - distribution_lambda_10_1_loss: 4.6102
Epoch 8/10
157/157 [==============================] - 1s 6ms/step - loss: 8.8837 - distribution_lambda_10_loss: 4.4159 - distribution_lambda_10_1_loss: 4.4678
Epoch 9/10
157/157 [==============================] - 0s 3ms/step - loss: 8.7027 - distribution_lambda_10_loss: 4.3319 - distribution_lambda_10_1_loss: 4.3708
Epoch 10/10
157/157 [==============================] - 0s 3ms/step - loss: 8.5743 - distribution_lambda_10_loss: 4.2724 - distribution_lambda_10_1_loss: 4.3019
<keras.callbacks.History at 0x7f51001c2f50>

Note what the docs state regarding the distributions when using DistributionLambda:

By default, a distribution is represented as a tensor via a random draw, e.g., tfp.distributions.Distribution.sample

Option 2: Use one Dense layer and split the output into two:

def get_df_model():
  sample_layer = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
                           scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))
  
  inputs = tf.keras.layers.Input(shape=[len(df.columns),])
  x = tf.keras.layers.Dense(10, activation='relu')(inputs)
  x = tf.keras.layers.Dense(10, activation='relu')(x)
  x = tf.keras.layers.Dense(2 * len(target.columns))(x)
  x1, x2 = tf.split(x, num_or_size_splits=2, axis=-1)
  outputs1 = sample_layer(x1)
  outputs2 = sample_layer(x2)
  model = tf.keras.Model(inputs, [outputs1, outputs2])

  negloglik = lambda y, rv_y: -rv_y.log_prob(y)

  model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
  return model

Option 3: Use slice :2

# Your code
#[.....]

tfd = tfp.distributions
sample_layer = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :2],
                           scale=1e-3 + tf.math.softplus(0.05 * t[...,2:])))
def get_df_model():
  inputs = tf.keras.layers.Input(shape=[len(df.columns),])
  x = tf.keras.layers.Dense(10, activation='relu')(inputs)
  x = tf.keras.layers.Dense(10, activation='relu')(x)
  outputs = tf.keras.layers.Dense(2*len(target.columns))(x)
  outputs = sample_layer(outputs)
  model = tf.keras.Model(inputs, [outputs])

  negloglik = lambda y, rv_y: -rv_y.log_prob(y)

  model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
  return model


model = get_df_model()
model.summary()
model.fit(df, target, epochs=10)

Additionally: If you want to explicitly use independent distributions based on the parameters x1 and x2, try:

def get_df_model():
  inputs = tf.keras.layers.Input(shape=[len(df.columns),])
  x = tf.keras.layers.Dense(10, activation='relu')(inputs)
  x = tf.keras.layers.Dense(10, activation='relu')(x)
  x = tf.keras.layers.Dense(2 * len(target.columns))(x)
  x1, x2 = tf.split(x, num_or_size_splits=2, axis=-1)
  
  outputs1 = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
                           scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))(x1)
  outputs2 = tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., :1],
                           scale=1e-3 + tf.math.softplus(0.05 * t[...,1:])))(x2)
  model = tf.keras.Model(inputs, [outputs1, outputs2])

  negloglik = lambda y, rv_y: -rv_y.log_prob(y)

  model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
  return model
AloneTogether
  • 25,814
  • 5
  • 20
  • 39