0

I have a fairly simple script to classify intents from natural language queries working pretty well, to which I want to add a word embedding layer from a pre-trained custom model of 200 dims. I'm trying to help myself with this tutorial Keras pretrained_word_embeddings But with what I have achieved so far, the training is very very slow! and even worse the model doesn't learn, accuracy doesn't improve with each epoch, something impossible to handle. I think I have not configured the layers correctly or the parameters are not correct. Could you help with this??

with open("tf-kr_esp.json") as f:
rows = json.load(f)
for row in rows["utterances"]:
    w = nltk.word_tokenize(row["text"])
    words.extend(w)
    documents.append((w, row["intent"]))
    if row["intent"] not in classes:
        classes.append(row["intent"])

words = sorted(list(set(words)))
classes = sorted(list(set(classes)))


word_index = dict(zip(words, range(len(words))))

embeddings_index = {}
with open('embeddings.txt') as f:
    for line in f:
        word, coefs = line.split(maxsplit=1)
        coefs = np.fromstring(coefs, "f", sep=" ")
        embeddings_index[word] = coefs

num_tokens = len(words) + 2
embedding_dim = 200
hits = 0
misses = 0

# Prepare embedding matrix
embedding_matrix = np.zeros((num_tokens, embedding_dim))
for word, i in word_index.items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
    # Words not found in embedding index will be all-zeros.
    # This includes the representation for "padding" and "OOV"
        embedding_matrix[i] = embedding_vector
        hits += 1
    else:
        misses += 1
print("Converted %d words (%d misses)" % (hits, misses))

embedding_layer = Embedding(
    num_tokens,
    embedding_dim,
    embeddings_initializer=tf.keras.initializers.Constant(embedding_matrix),
    trainable=False,
)

# create our training data
training = []
output_empty = [0] * len(classes)

for doc in documents:
    bag = []
    pattern_words = doc[0]
    for w in words:
        bag.append(1) if w in pattern_words else bag.append(0)
    output_row = list(output_empty)
    output_row[classes.index(doc[1])] = 1
    training.append([bag, output_row])

random.shuffle(training)
training = np.array(training, dtype="object")
train_x = list(training[:,0])
train_y = list(training[:,1])


int_sequences_input = tf.keras.Input(shape=(None,), dtype="int64")
embedded_sequences = embedding_layer(int_sequences_input)
x = layers.Conv1D(128, 5, activation="relu")(embedded_sequences)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(128, 5, activation="relu")(x)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(128, 5, activation="relu")(x)
x = layers.GlobalMaxPooling1D()(x)
x = layers.Dense(128, activation="relu")(x)
x = layers.Dropout(0.5)(x)
preds = layers.Dense(69, activation="softmax")(x)
model = tf.keras.Model(int_sequences_input, preds)
model.summary()

#sgd = SGD(learning_rate=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
model.fit(np.array(train_x), np.array(train_y), epochs=20, batch_size=128, verbose=1)

Epoch 1/20
116/116 [==============================] - 279s 2s/step - loss: 4.2157 - accuracy: 0.0485
Epoch 2/20
116/116 [==============================] - 279s 2s/step - loss: 4.1861 - accuracy: 0.0550
Epoch 3/20
116/116 [==============================] - 281s 2s/step - loss: 4.1607 - accuracy: 0.0550
Epoch 4/20
116/116 [==============================] - 283s 2s/step - loss: 4.1387 - accuracy: 0.0550
Epoch 5/20
116/116 [==============================] - 286s 2s/step - loss: 4.1202 - accuracy: 0.0550
Epoch 6/20
116/116 [==============================] - 284s 2s/step - loss: 4.1047 - accuracy: 0.0550
Epoch 7/20
116/116 [==============================] - 286s 2s/step - loss: 4.0915 - accuracy: 0.0550
Epoch 8/20
116/116 [==============================] - 283s 2s/step - loss: 4.0806 - accuracy: 0.0550
Epoch 9/20
116/116 [==============================] - 280s 2s/step - loss: 4.0716 - accuracy: 0.0550
Epoch 10/20
116/116 [==============================] - 283s 2s/step - loss: 4.0643 - accuracy: 0.0550
Svensk
  • 1
  • 1

1 Answers1

0

Can you mention how many number of class you have got? and also the embedding dimension is 200 that is okay but it is reality that pretrained vectors takes long time to train on the new embeddings. To make it more fast you can lower your input features in Convolutional layers. also you can use Adam as an optimizer instead of SGD. As SGD is much slower than Adam.

Sarim Sikander
  • 351
  • 2
  • 15
  • There are 69 classes and yes! it takes so long in the new embeddings, without de embeddings layer it takes just 5-10 min to train. – Svensk Feb 01 '22 at 11:56
  • I suggest to delete embeddings_initializer from embedding layer and ```set weights=[embedding_matrix]```. also change SGD to Adam optimizer – Sarim Sikander Feb 01 '22 at 12:01
  • I will try! then I update you. Thank you very much – Svensk Feb 01 '22 at 12:08
  • Doing your suggested changes: embedding_layer = Embedding(num_tokens, embedding_dim, weights=[embedding_matrix], trainable=False,) and model.compile(loss='categorical_crossentropy', optimizer="Adam", metrics=['accuracy']) Nothing seems to change. It's also weird because in each epoch the accuracy almost doesn't improve and it starts very low – Svensk Feb 01 '22 at 12:16
  • Yes that is what you should try. – Sarim Sikander Feb 01 '22 at 12:18
  • any new idea, about this behavior as I did what you suggested to me? – Svensk Feb 02 '22 at 13:19