-1

I am trying to use the glove.6B.50d.txt file as the pretrained embedding matrix for my model training as a baseline. For some reason, I keep getting an error that says: InvalidArgumentError: indices[15,32] = -2147483648 is not in [0, 400001) [[{{node embedding_2/embedding_lookup}}]]

Below is my code:

def pretrained_embedding_layer(word_to_vec_map,word_to_index):
    voc_len=len(word_to_index)+1
    embed_dim=word_to_vec_map["the"].shape[0]
    embedding_matrix=np.zeros((voc_len,embed_dim),dtype=np.float32)
    for word,index in word_to_index.items():
        embedding_vector=word_to_vec_map.get(word)
        if embedding_vector is not None:
            embedding_matrix[index,:]=embedding_vector
    embedding_layer=Embedding(input_dim=voc_len,output_dim=embed_dim,weights=[embedding_matrix],trainable=False)
    embedding_layer.build((None,))
    embedding_layer.set_weights([embedding_matrix])
    return embedding_layer

def sentiment_model(input_shape,word_to_vec_map,word_to_index):
    sentence_indices=Input(shape=input_shape,dtype=tf.float32)
    embedding_layer=pretrained_embedding_layer(word_to_vec_map,word_to_index)
    embeddings=embedding_layer(sentence_indices)
    X=LSTM(100,)(embeddings)
    X=Dense(2,activation='softmax')(X)
    model = Model(inputs=sentence_indices,outputs=X)
    return model
Vlad
  • 8,225
  • 5
  • 33
  • 45
Jane
  • 1
  • 1

1 Answers1

0

Problem solved. The reason for this is my input sample has a lot more words than available pretrained embeddings

Jane
  • 1
  • 1