I am pretty new to doc2vec then I made small research and found a couple of things. Here is my story: I am trying to learn using doc2vec 2.4 million documents. At first, I tried only doing so with a small model of 12 documents. I checked the results with infer vector of the first document and found it to be similar indeed to the first document by 0.97-0.99 cosine similarity measure. Which I found good, even though when I tried to enter a new document of completely different words I received a high score of 0.8 measure similarity. However, I had put it aside and tried to go on and build the full model with the 2.4 million documents. In this point, my problems began. The result was complete nonsense, I received in the most_similar function results with a similarity of 0.4-0.5 which were completely different from the new document checked. I tried to tune parameters but no result yet. I tried also to remove randomness both from the small and big model, however, I still got different vectors. Then I had tried to use get_latest_training_loss on each epoch in order to see how the loss changes over each epoch. This is my code:
model = Doc2Vec(vector_size=300, alpha=0.025, min_alpha=0.025, pretrained_emb=".../glove.840B.300D/glove.840B.300d.txt", seed=1, workers=1, compute_loss=True)
workers=1, compute_loss=True)
model.build_vocab(documents)
for epoch in range(10):
for i in range(model_glove.epochs):
model.train(documents, total_examples = token_count, epochs=1)
training_loss = model.get_latest_training_loss()
print("Training Loss: " + str(training_loss))
model.alpha -= 0.002 # decrease the learning rate
model.min_alpha = model.alpha # fix the learning rate, no decay
I know this code is a bit awkward, but it is used here only to follow the loss. The error I receive is:
AttributeError: 'Doc2Vec' object has no attribute 'get_latest_training_loss'
I tried looking at model. and auto-complete and found that indeed there is no such function, I found something similar name training_loss, but it gives me the same error.
Anyone here can give me an idea?
Thanks in Advance