2

I am new to the Gensim package, and I am trying to get a little familiar with it. I am now trying to import an existing, trained model. I am following exactly the example from this video (this section starts at 5:30). When I run the code from that video I get an error:

Code:

from gensim.models import Word2Vec, KeyedVectors
import pandas as pd
import nltk

model = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True, limit=100000)

model.wv.most_similar('man')

Error:

Traceback (most recent call last):
  File "C:....", line 7, in <module>
    model.wv.most_similar('man')
AttributeError: 'KeyedVectors' object has no attribute 'wv'

In a previous example I also ran into a similar problem when training an own model (namely with the name of the size parameter, which turned out to be changed to vector_size, because the package is deprecated I read). That probably only happened quite recently, because in the comments of the YouTube video I do not read about such problems yet. And that fix about that size parameter that was renamed was also from a post here on stackoverflow that is only 2 weeks old.

Does anyone know how I can still use this wv attribute, so that I can just continue with the example from the video?

lakeviking
  • 322
  • 1
  • 6
  • 18
  • I am not familiar with this library, but the [first result](https://radimrehurek.com/gensim/models/word2vec.html) using Google provides an answer. `wv` is not an attribute of `KeyedVectors` but of `Word2Vec`. You can probably use `most_similar` on your `model` variable. – BoobyTrap Apr 28 '21 at 13:11

2 Answers2

2

The comment from user ~BoobyTrap is correct. Your KeyedVectors instance already supports .most_similar(). There is no need to request its .wv property. You should just do:

model.wv.most_similar('man')

As you can see in the video, when it was prepared, there was already a deprecation warning, that .wv would be going away. (So, the creator of that video could have left it out then - it wasn't necessary, it was already generating a warning message that it would break soon, and using it indicates an incomplete understanding of the roles of the various object types.)

The only objects that now have .wv property are those which include a consituent KeyedVectors, like a full Word2Vec model. That is, when you have an object with a lot of extra state, but that has a set-of-word-vectors as one of its important parts, they are likely to be found in that object's .wv property.

gojomo
  • 52,260
  • 14
  • 86
  • 115
1

I encountered the same, and super THANKS! to gojomo's detailed explanation, got to sort out. So basically just take away the "wv".

<Example 1: to view the word embedding for 'king'>

FROM;

model_0123.wv['king']  

To;

model_0123['king']    

<Example 2: to get 20 of similar words for 'king'>

FROM;

model_0123.wv.most_similar(positive=['king'], topn=20)

To;

model_0123.most_similar(positive=['king'], topn=20)
czookol
  • 11
  • 1