5

I am using Word2Vec and using a wiki trained model that gives out the most similar words. I ran this before and it worked but now it gives me this error even after rerunning the whole program. I tried to take off return_path=True but im still getting the same error

print(api.load('glove-wiki-gigaword-50', return_path=True))
model.most_similar("glass")

#ERROR:

/Users/me/gensim-data/glove-wiki-gigaword-50/glove-wiki-gigaword-50.gz
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-153-3bf32168d154> in <module>
      1 print(api.load('glove-wiki-gigaword-50', return_path=True))
----> 2 model.most_similar("glass") 

AttributeError: 'Word2Vec' object has no attribute 'most_similar'

#MODEL this is the model I used

    print(
        '%s (%d records): %s' % (
            model_name,
            model_data.get('num_records', -1),
            model_data['description'][:40] + '...',
        )
    )

Edit: here is my gensim download & output

!python -m pip install -U gensim

OUTPUT:

Requirement already satisfied: gensim in ./opt/anaconda3/lib/python3.8/site-packages (4.0.1)

Requirement already satisfied: numpy>=1.11.3 in ./opt/anaconda3/lib/python3.8/site-packages (from gensim) (1.20.1)

Requirement already satisfied: smart-open>=1.8.1 in ./opt/anaconda3/lib/python3.8/site-packages (from gensim) (5.1.0)

Requirement already satisfied: scipy>=0.18.1 in ./opt/anaconda3/lib/python3.8/site-packages (from gensim) (1.6.2)

RSB
  • 49
  • 1
  • 1
  • 6
  • Don't you mean just ```model.similar```? – ewokx Aug 06 '21 at 05:55
  • @ewong it gives me this: ```AttributeError: 'Word2Vec' object has no attribute 'similar'``` – RSB Aug 06 '21 at 06:22
  • Are there more lines to your code, or is that all? Where is model defined? – ewokx Aug 06 '21 at 07:51
  • @ewong there is this ```for model_name, model_data in sorted(info['models'].items()): print( '%s (%d records): %s' % ( model_name, model_data.get('num_records', -1), model_data['description'][:40] + '...', ) )``` – RSB Aug 06 '21 at 13:20

2 Answers2

16

You are probably looking for <MODEL>.wv.most_similar, so please try:

model.wv.most_similar("glass") 
sophros
  • 14,672
  • 11
  • 46
  • 75
  • hi! I tried this but it gives me ```AttributeError: 'Word2Vec' object has no attribute 'vw'```. i updated my post with the model I used – RSB Aug 06 '21 at 13:24
  • Right. Interesting. Can you please post the version of the `gensim` library you are using too (as there were changes on the way)? – sophros Aug 06 '21 at 17:01
  • I used ```import gensim.models.word2vec as w2v``` and ```import gensim.downloader as api``` – RSB Aug 06 '21 at 17:22
  • This is not what I asked for. Can you please run `pip show gensim` and post the output? – sophros Aug 06 '21 at 17:28
  • Hello, I just added them to my post at the end @sophros – RSB Aug 06 '21 at 17:34
  • I just spotted it - it was a typo - `wv` instead of `vw`. Please check again! – sophros Aug 06 '21 at 18:45
  • Happy to hear! I would appreciate accepting the answer (gray tick mark on the left) and upvoting it. – sophros Aug 06 '21 at 20:19
  • actually, i realized that answer pointed to another model, not the wiki one. However, when I took off ```return_path=True``` it worked! thank you so much through – RSB Aug 07 '21 at 20:03
0

Your shown code...

print(api.load('glove-wiki-gigaword-50', return_path=True))
model.most_similar("glass")

...doesn't assign anything into model. (Was it assigned earlier?)

And, using return_path=True there means the api.load() will only return a string path to the datafile. That'd only be interesting if you were going to use that string to then do your own loading of the data into a model.

That api.load() call without return_path=True likely returns an instance of KeyedVectors, which is a set of vectors. That's different from a full Word2Vec model, but would still support a .most_similar() method. However, if you're just print()ing that returned path, or returned model, it's not going to be in the model variable for your later .most_similar() operation.

So you may want:

kv_model = api.load('glove-wiki-gigaword-50')
similars = kv_model.most_similar('glass')
print(similars)

(Personally, I don't like the opaque magic, & running of new downloaded code, that api.load() does. I think it's a better habit to download the raw data files yourself, from a known source, so that you know what files have arrived, to which directories, on your own machine. Then use a dataset-specific load method to load that data, so that you learn what library methods work with which kinds of files.)

If your model variable does in fact include a full Word2Vec model, from some unshown other code, then it will also contain a set of vectors in its .wv (for word-vectors) property:

similars = model.wv.most_similar('glass')
print(similars)
gojomo
  • 52,260
  • 14
  • 86
  • 115
  • This prints out similar words based on the training of my data. However, I would like to get the words that are trained by ```'glove-wiki-gigaword-50'``` – RSB Aug 06 '21 at 17:02
  • Have you tried assigning the results of your `api.load()` call into a variable instead of printing it? (You could assign it into `model` if you want to discard the `Word2Vec` model that's already there. Or you could assign it into a new variable like `kv_model`, to reflect that it's just a `KeyedVectors`.) – gojomo Aug 06 '21 at 17:06
  • I tried it and it gave me ```AttributeError: 'str' object has no attribute 'most_similar' ``` – RSB Aug 06 '21 at 17:39
  • What code did you try that gave that error? (That sounds like you assigned a string into the variable, not the results of `api.load()`.) – gojomo Aug 06 '21 at 17:42
  • This is what I did: ```kv_model= (api.load('glove-wiki-gigaword-50', return_path=True)) (kv_model.most_similar("glass"))``` – RSB Aug 06 '21 at 18:45
  • Aha, try it without the `return_path=True` argument. If you include that, it means you're asking for the string path to the dataset file, rather than the loaded model itself. I'll also add a note about this to my main answer. – gojomo Aug 06 '21 at 19:54