0

I want to pass my trained gensim word2vec model as an embedding model to FAISS.from_documents(). Thereby I get an error

AttributeError: 'Word2Vec' object has no attribute 'embed_documents'

My code:


w2v_model = Word2Vec(sentences=list_pre_word2vec)

# # #
# # # some training magic for Word2Vec
# # #

from langchain.vectorstores import FAISS

# storing embeddings in the vector store
vectorstore = FAISS.from_documents(clean, w2v_model)

clean is list of langchain documents.

How can I pass my word2vec model as an embedding model to FAISS.from_documents(), how can I use the model in langchain? Is this an recommended approach or is there a better (easier, more efficient) way?

Thanks in forward.

Christian01
  • 307
  • 1
  • 5
  • 19

1 Answers1

0

The langchain FAISS.from_documents() method's documentation…

https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.faiss.FAISS.html#langchain.vectorstores.faiss.FAISS.from_documents

…suggests it expects a 2nd argument of type Embeddings, where Embeddings is langchain's own class with an interface that's nothing-at-all like that of a Gensim Word2Vec model.

Embeddings looks like some sort of utility interface for turning full texts into vector-embeddings - whereas a Word2Vec model is only, at core, a mapping from individual words to word-vectors, not a multiword-text embedding method.

The word2vec algorithm is not a generic way to turn multiword texts into single vectors. If you wanted to use the vectors for single words from a Word2Vec model as the inputs to some other method of turning multiword texts into vectors, you'd want to consciously & explicitly choose such a method – and then your next steps would depend on the method you'd chosen.

gojomo
  • 52,260
  • 14
  • 86
  • 115