Vanishing gradients while finetuning efficient net V2 XL

Question

I have an aerial imagery dataset (around 250k images for training) and am trying to finetune the efficient net v2 xl architecture trained on imagenet 21k dataset. However, as the architecture is not directly available with keras I am using the pretrained backbone from tensorflow hub. Below is the implementation of my model.

inputs = layers.Input(shape=(512, 512, 3))
model = hub.KerasLayer("https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet21k_xl/feature_vector/2", trainable=True)
x = model(inputs)

# multi - task learning
# 3 outputs using same backbone
output1 = layers.Dense(3, activation='softmax')(x)
output2 = layers.Dense(6, activation='softmax')(x)
output3 = layers.Dense(5, activation='softmax')(x)

model = tf.keras.Model(inputs, [output1, output2, output3])

lr_schedule = tf.keras.optimizers.schedules.CosineDecay(0.001, 10000)
optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule, clipvalue=2)

sparse_categorical_crossentropy loss is being used to optimize the weights.

I am using 220k images to train the model and the batch size used is 8. At the end of the first epoch I see that the weights are becoming NaN and slowly the loss becomes NaN as well.

I believe that the gradients are vanishing and any help to prevent this issue is highly appreciated.

Hi @user3415910, Vanishing Gradient issue can be rectified using [relu](https://builtin.com/machine-learning/relu-activation-function) activation layer which interprets the positive part of its argument using max(x, 0). All the negative values default to zero, and the maximum for the positive number is taken into consideration. Please try again using Relu or [LeakyRelu](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LeakyReLU) activation function and let us know if the issue still persists. — TF_Renu Patel, Jun 20 '23 at 16:27

Vanishing gradients while finetuning efficient net V2 XL

0 Answers0