Is there any pretrained word2vec model capable of detecting phrase

Question

Is there any pretrained word2vec model with data containing both single word or multiple words coalesced together such as 'drama', 'drama_film' or '‘africanamericancommunity’. Is there any such model trained with huge dataset such as dataset trained for gloVE?

I think that in the model released by google you also have names of cities such as New_York — teoML, Dec 16 '19 at 23:31

teoML · Answer 1 · 2019-12-16T23:45:41.300

I did a quick search on google, but unfortunately I could not find a pretrained model. One way to train your own model to detect phrases is to use a bigram model. So, you can take a big wikipedia dump, for instance, preprocess is using bigrams and train the word2vec model. A good github project which can help you to achieve this is https://github.com/KeepFloyding/wikiNLPpy A nice article on the topic: https://towardsdatascience.com/word2vec-for-phrases-learning-embeddings-for-more-than-one-word-727b6cf723cf

As stated in google pre-trained word2vec, the pre-trained model by google already contains some phrases (bigrams).

Is there any pretrained word2vec model capable of detecting phrase

1 Answers1