-1

I am analyzing the call records and try to use doc2vec I cant find the appropriate way to apply

I tried to convert words to root later i will try to get rid of stop words(which are rooted).

I desire to understand that each what the conversation is about(that can be a few or more words).Can you suggest me a certain way or sample project ?

N.K
  • 38
  • 5

1 Answers1

0

Note that many word2vec/doc2vev projects don't apply word-stemming (converting words to their roots), nor remove stop words. With an adequately-large training corpus, neither step is strictly necessary.

You seem to be at a very rudimentary starting point, so you should work through online examples of Doc2Vec (and more generally "topic modeling"). Several Jupyter Notebooks demonstrating both basic and more advanced uses of Doc2Vec are included with gensim, in the installations docs/notebooks directory. You can also view them online at:

https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/

doc2vec-lee.ipynb: very simple example of usage on toy-sized data

doc2vec-IMDB.ipynb: more advanced example based on a movie-reviews experiment included in the original "Paragraph Vector" (Doc2Vec) research paper

doc2vec-wikipedia.ipynb: much larger & longer-running model using millions of Wikipedia articles

Though you can browse these online, you can and should run them locally step-by-step as a learning exercise, then tinker with them slightly as an exploration, before finally using them (and other sources) as guides for how you can approach your own problem.

gojomo
  • 52,260
  • 14
  • 86
  • 115