Is there any pretrained word2vec model with data containing both single word or multiple words coalesced together such as 'drama', 'drama_film' or '‘africanamericancommunity’. Is there any such model trained with huge dataset such as dataset trained for gloVE?
Asked
Active
Viewed 468 times
1
-
I think that in the model released by google you also have names of cities such as New_York – teoML Dec 16 '19 at 23:31
1 Answers
1
I did a quick search on google, but unfortunately I could not find a pretrained model. One way to train your own model to detect phrases is to use a bigram model. So, you can take a big wikipedia dump, for instance, preprocess is using bigrams and train the word2vec model. A good github project which can help you to achieve this is https://github.com/KeepFloyding/wikiNLPpy A nice article on the topic: https://towardsdatascience.com/word2vec-for-phrases-learning-embeddings-for-more-than-one-word-727b6cf723cf
As stated in google pre-trained word2vec, the pre-trained model by google already contains some phrases (bigrams).

teoML
- 784
- 4
- 13