1

Is there any pretrained word2vec model with data containing both single word or multiple words coalesced together such as 'drama', 'drama_film' or '‘africanamericancommunity’. Is there any such model trained with huge dataset such as dataset trained for gloVE?

Shadekur Rahman
  • 73
  • 1
  • 2
  • 9
  • I think that in the model released by google you also have names of cities such as New_York – teoML Dec 16 '19 at 23:31

1 Answers1

1

I did a quick search on google, but unfortunately I could not find a pretrained model. One way to train your own model to detect phrases is to use a bigram model. So, you can take a big wikipedia dump, for instance, preprocess is using bigrams and train the word2vec model. A good github project which can help you to achieve this is https://github.com/KeepFloyding/wikiNLPpy A nice article on the topic: https://towardsdatascience.com/word2vec-for-phrases-learning-embeddings-for-more-than-one-word-727b6cf723cf

As stated in google pre-trained word2vec, the pre-trained model by google already contains some phrases (bigrams).

teoML
  • 784
  • 4
  • 13