Differences between BERT sentence embeddings and LSA embeddings

Question

BERT as a service (https://github.com/hanxiao/bert-as-service) allows to extract sentence level embeddings. Assuming I have a pre-trained LSA model which gives me a 300 dimensional word vector, I am trying to understand in which scenario would an LSA model perform better than BERT when I am trying to compare two sentences for semantic coherence?

I cannot think of a reason why LSA would be better for this use case - since LSA is just a compression of a big bag of words matrix.

score 0 · Answer 1 · answered Mar 03 '20 at 09:03

0

BERT requires quadratic memory with the sequence length and is only trained on pairs on split sentences. This might be inconvenient when processing really long sentences.

For LSA, you only need the bag-of-word vector which is indeed constant-sized in the document length. For really long documents, LSA might still a better option.

answered Mar 03 '20 at 09:03

Jindřich

10,270
2
23
44

But assuming I am working with short paragraphs, and memory is not an issue - BERT should be outperforming LSA for measuring something like semantic coherence right? Also, are you aware of any state of the art pre-trained LSA models? – Samarth Mar 03 '20 at 16:13
Yes, I would expect BERT to be better for short paragraphs. I don't know details of `bert-as-service`, but Hugingface's [Transformers package](https://github.com/huggingface/transformers) limits the input length to 512 tokens. I don't know about any pre-trained LSA, but unlike BERT, the result is usually so dataset specific, that I doubt that it would be worth using a pre-trained model. – Jindřich Mar 04 '20 at 09:00

Differences between BERT sentence embeddings and LSA embeddings

1 Answers1