Well the issue is I have 1000s of the document and I passed all the documents for the training of Gensim model and I successfully trained and saved the model in .model format.
But with the current format, 2 new files have also been generated
- doc2vec.model
- doc2vec.model.trainables.syn1neg.npy
- doc2vec.model.wv.vectors.npy
Due to the limitation of Hardware I trained and saved the model on Google Colab and Google Driver respectively. When I downloaded the generated models and extra files in my local machine and ran the code it's giving me a File Not Found Error, whereas I have added the particular files where the .py file is or current working directory is.
Well I used below code
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from nltk.tokenize import word_tokenize
files = readfiles("CuratedData")
data = [TaggedDocument(words=word_tokenize(_d.decode('utf-8').strip().lower()), tags=[str(i)]) for i, _d in enumerate(files)]
max_epochs = 100
vec_size = 300
alpha = 0.025
model = Doc2Vec(vector_size=vec_size,
alpha=alpha,
min_alpha=0.00025,
min_count=1,
dm=1)
model.build_vocab(data)
for epoch in range(max_epochs):
print('iteration {0}'.format(epoch))
model.train(data,
total_examples=model.corpus_count,
epochs=model.iter)
# decrease the learning rate
model.alpha -= 0.0002
# fix the learning rate, no decay
model.min_alpha = model.alpha
model.save("doc2vec.model")
print("Model Saved")
Code for Loading the Model
webVec = ""
try:
path = os.path.join(os.getcwd(), "doc2vec.model")
model = Word2Vec.load(path)
data = word_tokenize(content['htmlResponse'].lower())
# Webvector
webVec = model.infer_vector(data)
except ValueError as ve:
print(ve)
except (TypeError, ZeroDivisionError) as ty:
print(ty)
except:
print("Oops!", sys.exc_info()[0], "occurred.")
Any help would be greatly appreciated. Thanks, Cheers