2

According to Micholov paper I want to compute Doc2Vec using Keras. I'm new on Keras so I need your help.

There is a corpus of documents with an Id and I want to get two embeddings matrices : one for words and one for paragraphs, isn't it ?

Is it possible to adapt my Word2Vec code to get these embeddings ?

This an extract of my W2V code :

    from keras.models import Sequential

    cbow = Sequential()
    cbow.add(Embedding(input_dim=V, output_dim=dim,input_length=window_size*2))
    cbow.add(Lambda(lambda x: K.mean(x, axis=1), output_shape=(dim,)))
    cbow.add(Dense(V, activation='softmax'))

Should I add another embedding layer to take into account the paragraph id ?

  • gensim (https://radimrehurek.com/gensim/models/doc2vec.html) seems to be the standard tool but I agree it would be helpful to also have an implementation in Keras – zkurtz Jul 10 '19 at 18:48

0 Answers0