1

When using Pinecone in Langchain, whenever I do a similarity search -- which is supposed to return the most relevant documents to my query -- I find that it returns the same document again and again. (When I use other vectorstores such as Chroma and FAISS, then I indeed get different documents as expected.)

>>> pinecone_vectordb
<langchain.vectorstores.pinecone.Pinecone object at 0x0000017F1FEE29D0>

>>> query = "what are some bonus features offered by credit cards?"

>>> found_docs = vectordb_pinecone.similarity_search(query,k=3)

>>> found_docs
[Document(page_conten...tadata={}), Document(page_conten...tadata={}), Document(page_conten...tadata={})]
special variables
function variables
0:
Document(page_content='What Are Some of the Bonus Categories for Business Credit Cards?', metadata={})
1:
Document(page_content='What Are Some of the Bonus Categories for Business Credit Cards?', metadata={})
2:
Document(page_content='What Are Some of the Bonus Categories for Business Credit Cards?', metadata={})

I attempted to do a similarity search which I expected should return a series of documents in a hierarchy according to how similar they were to my query.

1 Answers1

0

There is probably an issue with your document creation. I faced the same issue, and Ive explained the solution here: https://github.com/hwchase17/langchain/pull/7332

Basically, you should avoid using Document(page_content=chunk, metadata=source.metadata) or something similar, if you have used. This is because, the source.metadata here is a dict, which does not create a copy of itself on assignment. Instead, you can use something like Document(page_content=chunk, metadata=source.metadata.copy())