1

Well the issue is I have 1000s of the document and I passed all the documents for the training of Gensim model and I successfully trained and saved the model in .model format.

But with the current format, 2 new files have also been generated

  1. doc2vec.model
  2. doc2vec.model.trainables.syn1neg.npy
  3. doc2vec.model.wv.vectors.npy

Due to the limitation of Hardware I trained and saved the model on Google Colab and Google Driver respectively. When I downloaded the generated models and extra files in my local machine and ran the code it's giving me a File Not Found Error, whereas I have added the particular files where the .py file is or current working directory is.

Well I used below code

from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from nltk.tokenize import word_tokenize

files = readfiles("CuratedData")
data = [TaggedDocument(words=word_tokenize(_d.decode('utf-8').strip().lower()), tags=[str(i)]) for i, _d in enumerate(files)]

max_epochs = 100
vec_size = 300
alpha = 0.025

model = Doc2Vec(vector_size=vec_size,
                alpha=alpha,
                min_alpha=0.00025,
                min_count=1,
                dm=1)

model.build_vocab(data)

for epoch in range(max_epochs):
    print('iteration {0}'.format(epoch))
    model.train(data,
                total_examples=model.corpus_count,
                epochs=model.iter)
    # decrease the learning rate
    model.alpha -= 0.0002
    # fix the learning rate, no decay
    model.min_alpha = model.alpha

model.save("doc2vec.model")
print("Model Saved")

Code for Loading the Model

    webVec = ""
    try:

        path = os.path.join(os.getcwd(), "doc2vec.model")

        model = Word2Vec.load(path)

        data = word_tokenize(content['htmlResponse'].lower())

        # Webvector
        webVec = model.infer_vector(data)
    except ValueError as ve:
        print(ve)
    except (TypeError, ZeroDivisionError) as ty:
        print(ty)
    except:
        print("Oops!", sys.exc_info()[0], "occurred.")

Any help would be greatly appreciated. Thanks, Cheers

1 Answers1

0

Saving a large model will usually create several subsidiary files for the large internal arrays. All those files must be kept together. (They will all start with the same string, the name you originally specified - in your case, doc2vec.model.)

It's possible there was another file you failed to download. But without seeing the code you used to trigger the error, or the full error traceback stack (with filenames and lines-of-code involved), it's hard to guess what exactly you did to trigger a FileNotFoundError. You may want to edit your question to add that info, so it's clearer what code you ran before, and what library code is involved in, the exact error.

gojomo
  • 52,260
  • 14
  • 86
  • 115
  • Thanks for the reply gojomo. I have checked thoroughly there is no new file generated. But the thing that I observed is that when I used that model with Google Collab itself. I'm able to use the model when I download and try to use it on my local machine – Rishabh Chandaliya Jain Aug 28 '20 at 01:51
  • Because none of the lines in the code you've added do a `.load()`, they can't help reveal what triggered the `FileNotFoundError`. Also, there's still no full error message/traceback. So it's still anyone's guess what actually triggered the error. And when you say "I'm able to use the model when I download and try to use it on my local machine", does that mean you're no longer getting the error at all? – gojomo Aug 28 '20 at 06:30
  • 1
    Separately: you've copied some unnecessarily complicated & error-prone code to make your code, when you manage `alpha` yourself & call `.train()` many times in a loop. See answer at https://stackoverflow.com/questions/62801052/my-doc2vec-code-after-many-loops-of-training-isnt-giving-good-results-what-m for more details why this is a bad approach. – gojomo Aug 28 '20 at 06:32
  • Thanks for the heads up regarding the complication of the code. I have updated the code have a look and One more strange thing I came across was that I deployed it on one of the VM and it was showing me the error so what I did was I trained the Gensim model on the VM and generated the .model files and Vola It worked but I trained it on only 10-15 docs. I think gensim is using some other parameters as well that is not accessible or storing in temporary variable – Rishabh Chandaliya Jain Aug 28 '20 at 12:36
  • Without seeing the full error message, & traceback stack, it's impossible to guess what you might be hitting. (Your theory of "using some other parameters… that is not accessible" is not really plausible, but what you're really hitting? Only knowable by seeing the actual, full, original error message. Not anything you've caught & re-printed.) – gojomo Aug 28 '20 at 17:58
  • Also, your training code (which is irrelevant to any `FileNotFound` happening later during loading) as it appears in your question still shows the bad practices highlighted in the link I provided. – gojomo Aug 28 '20 at 18:01