2

I trained and tested a DL model that uses a different version of the Attention layer provided by keras because it needs to do more stuff in respect the basic one. My perfomance are about 90.4% of accuracy on test. But when i close all the colab and then reload it and reload the model, it is giving very bad perfomance on the exactly test set.

this is the personalized attention layer that I'm using for my purpose.

class Attention(Layer):
    def __init__(self, **kwargs):
        super(Attention, self).__init__(**kwargs)

    def build(self, input_shape):
        # Initialize weights for attention
        self.W = self.add_weight(name='attention_weights', shape=(input_shape[-1], 1), initializer='uniform', trainable=True)
        super(Attention, self).build(input_shape)

    def call(self, inputs):
        # Compute attention scores and weights
        e = K.tanh(K.dot(inputs, self.W))
        a = K.softmax(e, axis=1)
        weighted = inputs * a
        # Compute weighted average of inputs
        attention = K.sum(weighted, axis=1)
        return attention
        
    def get_config(self):
        config = super(Attention, self).get_config()
        return config

    @classmethod
    def from_config(cls, config):
        return cls(**config)

This is my model that I'm trying to reload again after i tested it:

inputs = Input(shape=(x_train.shape[1:]))
x = Conv1D(filters=128, kernel_size=3, activation='relu')(inputs)
x = Conv1D(filters=128, kernel_size=3, activation='relu')(x)
x = MaxPooling1D(pool_size=3)(x)
x = Dropout(0.2)(x)

x = Bidirectional(GRU(units=128, activation='tanh', return_sequences=True))(x)
x = Dropout(0.2)(x)
x = Conv1D(filters=128, kernel_size=3, activation='relu')(x)
x = MaxPooling1D(pool_size=3)(x)
x = Dropout(0.2)(x)

x = Bidirectional(GRU(units=128, activation='tanh', return_sequences=True))(x)
x = Dropout(0.2)(x)
x = Conv1D(filters=128, kernel_size=3, activation='relu')(x)
x = MaxPooling1D(pool_size=3)(x)
x = Dropout(0.2)(x)

x = Bidirectional(GRU(units=128, activation='tanh', return_sequences=True))(x)
x = Attention()(x)  # Using the custom Attention layer here
x = Flatten()(x)
x = Dense(units=64, activation='relu')(x)
x = Dropout(0.2)(x)
x = Dense(units=64, activation='relu')(x)
outputs = Dense(units=2, activation='softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

then i compile with this:

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=['accuracy'])

and after, the training become to go in this way:

checkpoint = ModelCheckpoint("modelliGRU/model_GRU_new_V3_bestWEIGHTS_FINAL_binary.h5", 
                             monitor="val_accuracy", # Metric to monitor
                             save_best_only=True, # Save only the best model
                             save_weights_only=False, # Save the entire model
                             mode='max', 
                             verbose=1)

def lr_schedule(epoch, lr):
    if epoch > 70 and \
            (epoch - 1) % 10 == 0:
        lr *= 0.1
    print("Learning rate: ", lr)
    return lr

lr_scheduler = LearningRateScheduler(lr_schedule) # Dynamic adjustment learning rate

history = model.fit(x_train, y_train, batch_size=128, epochs=100, validation_data=(x_val,y_val), callbacks=(checkpoint,lr_scheduler))
model.save(os.path.join("modelliGRU/model_GRU_new_v3_FINAL_binary.h5")) # Save training model



As you can see, I'm saving both last model trained (100th epoch) and the model with the best accuracy on validation set. Then, i tested it with this:

loss, accuracy = model.evaluate(x_test, y_test) # test the model
print("Test loss: ", loss)
print("Accuracy: ", accuracy)

and it gives me 90.4% of accuracy on test set (model of 100th epoch) and 90.3% of accuracy on test set with best model on validation.

Now, if i close all and then re-open to only load the model and try it with this:

model = tf.keras.models.load_model('modelliGRU/model_GRU_new_V3_bestWEIGHTS_FINAL_binary.h5',custom_objects={'Attention':Attention})

It gave me very bad perfomance respect of what I got in training phase. I have to use custom_objects in load_model because I'm using a custom attention layer. So you know some way to fix it? I need is for my majoring thesis work and you are my last hope.

0 Answers0