I have a corpus of sentences. Each of them may contain marked compound words. For example:
This is an example_sentence followed by another awesome_paragraph
. I want to get embedding vector for all tokens and compound words
(this, is, an, example, sentence, followed, by, another, awesome, paragraph, example_sentence, awesome_paragraph)
Can I do this with gensim or which library should I use?