How to represent ELMo embeddings as a 1D array?

Question

I am using the language model ELMo - https://allennlp.org/elmo to represent my text data as a numerical vector. This vector will be used as training data for a simple sentiment analysis task.

In this case the data is not in english, so I downloaded a custom ELMo model from - https://github.com/HIT-SCIR/ELMoForManyLangs (i assume this behavs similar as the offical allennlp repo)

To convert a text document to an ELMo embedding the function sents2elmo is used. The argument is a list of tokenized sentences if I understood the documentation correct.

So one sample in my training data could be embedded as following:

from elmoformanylangs import Embedder
embedder = Embedder('custom_language') 
embeddings = embedder.sents2elmo([['hello', 'world', 'how', 'are', 'you', '?'], 
                                  ['am', 'great', 'thanks', '!']])

This will return a list of two numpy arrays, one for each sentence, and each token in the sentence will be represented as one vector of size 1024. And since the default parameter of sents2elmo(output_layer) is -1, this vector represents the average of the 3 internal layers in the language model.

How can the embeddings be represented as a 1D array? Shall I just average all the word vectors for one sentence. And then average all the sentence vectors?

sentence_1 = np.mean(embeddings[0], axis=0)
sentence_2 = np.mean(embeddings[1], axis=0)
document = np.mean([sentence_1, sentence_2], axis=0)

Does this approach destroy any information? If so, are there other ways of doing this?

Thanks!

score 1 · Answer 1 · answered Dec 17 '18 at 00:06

I believe the most common solution would be to find the mean of the tokens for each sentence to have one embedding per sentence. You could also sum them, but you may then risk an exploding vector should a sentence have many tokens.

Or, after embedding all of your data, you may normalize the embedding features across the entire data set. This would cause everything to lie in a high dimensional sphere, should your application perform better on a manifold like that.

score 0 · Answer 2 · answered Apr 22 '19 at 22:30

As Alex says, the way you reduce the size of each sentence it's very common to deal with the differences in sentences sizes, but I don´t know why you need average all the sentence vectors. It is not necessary, since you have now a space of 1024 features for each document, you can use PCA to reduce dimensions.

How to represent ELMo embeddings as a 1D array?

2 Answers2