-1

I have a text generation task learning to predict the next word with an LSTM network with multiple output layers. After the generation of a sentence has finished, I calculate a reward for the whole sentence and try to update the output layers participated in the generation (contributing layers get the calculated reward value, others get 0). My problem is that even if I update only the selected output layers, it seems that other layer's weights got updated instead.

I have a minimized example with dummy data to present the problem:

import random

import numpy as np
import tensorflow as tf

from keras.layers import Input, LSTM, Dense, Embedding
from keras.utils import pad_sequences
from tensorflow.keras.models import Model


def policy_gradient_loss(y_true, y_pred):
    return tf.reduce_mean(tf.math.log(y_pred) * float(y_true))

# Define the model with 3 output layers (named 'a', 'b' and 'c').
input_layer = Input(shape=(4,))
embedding_layer = Embedding(input_dim=10, output_dim=4)(input_layer)
lstm_layer = LSTM(4)(embedding_layer)
output_layers = [Dense(3, activation='softmax', name=name)(lstm_layer) for name in ['a', 'b', 'c']]
model = Model(inputs=input_layer, outputs=output_layers)
model.compile(loss=[policy_gradient_loss] * 3, optimizer='adam', run_eagerly=True)

# Dummy input data.
input_data = np.array([[2, 3, 4, 5]])

# Create target data to reward only the 'b' output layer.
target_data = [np.array([0]) for _ in range(len(model._output_layers))]
target_data[1] = np.array([10]) 

# Save initial weights.
initial_weights = model.get_weights()

model.train_on_batch(input_data, y=target_data)

# Save weights after the learning.
updated_weights = model.get_weights()

# Compare the before-after weights.
for layer_idx, (layer_name, initial_w, updated_w) in enumerate(zip([layer.name for layer in model.layers], initial_weights, updated_weights)):
    if not tf.math.reduce_all(tf.equal(initial_w, updated_w)):
        print(f'The weights in layer {layer_idx} ({layer_name}). has changed.')

Result:

The weights in layer 0 (input_1). has changed.
The weights in layer 1 (embedding). has changed.
The weights in layer 2 (lstm). has changed.
The weights in layer 3 (a). has changed.

My expectation would be to get the layer 4. (output layer 'b') updated instead of layer 'a' (or at least beside 'a').

What am I missing? Is my expectation or my implementation wrong? (Or both...?)

eris
  • 1
  • Your output layer is a single layer and has three outputs, so there is no layer 4. Dense layers are not sequential but in parallel according to your code. All weights between the LSTM layer and the last layer get updated. There should be 36 weights updated in this computation. And through the backward propagation, earlier layers are also updated. I don't know your use case but your code seems correct from here. – hsaltan Aug 05 '23 at 16:09
  • @hsaltan Thanks for your answer. You are right, layer 4. was meant for the `layer_idx` in the loop above, but it doesn't mean a 4. sequential layer, indeed. My use case is to generate a sequence of decisions optimized to reach some goal. At every step and state, there are different kind and number of decisions to choose from. These decisions are encoded by the output layers. When the generation of a sequence has finished, I calculate the reward and apply it to the participating output layers. The very first decision is whether to stop or go on. – eris Aug 05 '23 at 22:12
  • This is my test decision to see if my model improves as expected. I used the above loss function (sometimes multiplied by -1) and tried with many (penalty, reward) pairs to enforce the "non-empty-path": (-10, 10), (-1, 0), (-1, 1), (10, 0), etc for generating (empty, non-empty) sequences. But as a result, the convergence of probabilities for the first stop-or-go decision varied from execution to execution: it either converged to the "stop" or to the "go" very quickly and I had no clue about the reason of this instability. – eris Aug 05 '23 at 22:12
  • Hence I tried to simplify the problem and see if I use this multi-output network as expected (use a single input sequence for `input_data` and a separate reward value for all the participating output layers) and I saw that even if I manually enforced the updating of a specific output layer, its weights didn't changed but instead another output layer's weights were updated. – eris Aug 05 '23 at 22:12
  • The model takes care of updates of weights and it is no worry. Look if your loss goes down and see if accuracy is acceptable. The model will update the weights to minimize the loss. Toward that end, if weights have to stay the same, then no worries. It's not something you should care about. – hsaltan Aug 06 '23 at 07:44

0 Answers0