0

Edit at the end


My problem: I would like to be able to create and train a model in a for loop, so that each time some of the weights (in my case I need to freeze an LSTM layer and let other two dense layers be trained) reset before each training.

My objective is to train the same architecture, over the same dataset, a number of times (e.g. 20 times), starting from a random initialization of some layers, so that I am able to calculate "an average performance" - i.e. the average of validation accuracy over, for example, 20 trainings.

My code:

  1. I create a model to later load some weights on it
input_tensor = layers.Input(shape = (time_steps, input_dim))
lstm = layers.LSTM(n_hidden, name = 'lstm')(input_tensor)
output_tensor = layers.Dense(1, activation='sigmoid')(lstm)
model = Model(input_tensor, output_tensor, name = 'test')
model.compile(optimizer=optim, 
               loss='binary_crossentropy', 
               metrics=['acc'])

model.load_weights('best_weights.h5')
  1. I get a layer of the above model and freeze its weights. Then I train a bigger model that contains the frozen layer
lstm_layer = model.get_layer('lstm')
lstm_layer.trainable = False

for rep in range(20):
    clear_session()
    
    fc = layers.Dense(16, activation = 'relu')(lstm)
    output = layers.Dense(1, activation = 'sigmoid')(fc)
    model2 = Model(input_tensor, output)
    model2.compile(optimizer=optim,
                   loss='binary_crossentropy',
                   metrics=['acc'])
    
    history = model2.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, 
                          shuffle=True, verbose=2, validation_data=(x_val, y_val))
    
    results = model2.evaluate(x_test, y_test, verbose = 2)

Wrong behaviour/results During the first ever training, the metrics seem to increase from plausible values (val_acc is about 0.5). From the second iteration of the loop, it looks like val_loss and val_accuracy start improving at the first epoch from some "cap" values (e.g. val_acc about 1.), as if the weights have not been reset.

Other stackoverflow sources I've been reading This post shows that creating a model with Sequential() should reset weights. Why would this be different when it comes to Model()?

Thank you in advance for your help, I'm a beginner.


Edit: It seems that the problem is the way I'm creating and using the optimizer, which is Adam in my case. I instantiated the class tf.keras.optimizers.Adam() outside the for loop.

In this way its weights might be "converging" to some values that make my training start from an already high value for val_acc.

I consider my problem solved by creating a new optimizer at each iteration of the loop, so that its weights are re-initialized before each training.

The cause of such a strong impact of the weights of the optimizer on the training (at least at the first epochs of the training) might be related to the relatively little number of weights of the network that are actually being trained. These are just assumptions, as I'm no expert, please correct me if I'm wrong.

Sweenna
  • 1
  • 3

0 Answers0