ALBERT not converging - HuggingFace

Question

I'm trying to apply a pretrained HuggingFace ALBERT transformer model to my own text classification task, but the loss is not decreasing beyond a certain point.

Here's my code:

There are four labels in my text classification dataset which are:

0, 1, 2, 3

Define the tokenizer

maxlen=25
albert_path = 'albert-large-v1'
from transformers import AlbertTokenizer, TFAlbertModel, AlbertConfig
tokenizer = AlbertTokenizer.from_pretrained(albert_path, do_lower_case=True, add_special_tokens=True,
                                                max_length=maxlen, pad_to_max_length=True)

Encode all sentences in text, using the tokenizer

encodings = []
for t in text:
  encodings.append(tokenizer.encode(t, max_length=maxlen, pad_to_max_length=True, add_special_tokens=True))

Define the pretrained transformer model and add Dense layer on top

    from tensorflow.keras.layers import Input, Flatten, Dropout, Dense
    from tensorflow.keras import Model

    optimizer = tf.keras.optimizers.Adam(learning_rate= 1e-4)
    token_inputs = Input((maxlen), dtype=tf.int32, name='input_word_ids')
    config = AlbertConfig(num_labels=4, dropout=0.2, attention_dropout=0.2)
    albert_model = TFAlbertModel.from_pretrained(pretrained_model_name_or_path=albert_path, config=config)

    X = albert_model(token_inputs)[1] 
    X = Dropout(0.2)(X)
    output_= Dense(4, activation='softmax', name='output')(X)

    bert_model2 = Model(token_inputs,output_)
    print(bert_model2.summary())
    
    bert_model2.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')

Finally, feed the encoded text and labels to the model

encodings = np.asarray(encodings)
labels = np.asarray(labels)
bert_model2.fit(x=encodings, y = labels, epochs=20, batch_size=128)


Epoch 11/20
5/5 [==============================] - 2s 320ms/step - loss: 1.2923
Epoch 12/20
5/5 [==============================] - 2s 319ms/step - loss: 1.2412
Epoch 13/20
5/5 [==============================] - 2s 322ms/step - loss: 1.3118
Epoch 14/20
5/5 [==============================] - 2s 319ms/step - loss: 1.2531
Epoch 15/20
5/5 [==============================] - 2s 318ms/step - loss: 1.2825
Epoch 16/20
5/5 [==============================] - 2s 322ms/step - loss: 1.2479
Epoch 17/20
5/5 [==============================] - 2s 321ms/step - loss: 1.2623
Epoch 18/20
5/5 [==============================] - 2s 319ms/step - loss: 1.2576
Epoch 19/20
5/5 [==============================] - 2s 321ms/step - loss: 1.3143
Epoch 20/20
5/5 [==============================] - 2s 319ms/step - loss: 1.2716

Loss has decreased from 6 to around 1.23 but doesn't seem to decrease any further, even after 30+ epochs.

What am I doing wrong?

All advice is greatly appreciated!

Have you tried smaller batch sizes? Might not be the case here, but I wonder if you get the same result even using a different batch size. — Pedram, Jun 20 '20 at 18:44
@Pedram Using a batch sizes of 1 and 32 didn't help to decrease the loss beyond 1.23 — beginner, Jun 20 '20 at 18:55
@beginner did you solve the issue until now? I am experiencing a similar one for a different architecture. — Hasan Salim Kanmaz, Dec 23 '20 at 11:26

score -1 · Answer 1 · answered Jun 30 '20 at 05:42

-1

You can try using SGD Optimizer
Introduce Batch Normalization
Try adding a few layers (not-pretrained) on the top of Albert layer.

answered Jun 30 '20 at 05:42

Ritam Majumdar

11
2

ALBERT not converging - HuggingFace

1 Answers1