Why does Pinecone repeatedly return the same result from my series of documents?

Question

When using Pinecone in Langchain, whenever I do a similarity search -- which is supposed to return the most relevant documents to my query -- I find that it returns the same document again and again. (When I use other vectorstores such as Chroma and FAISS, then I indeed get different documents as expected.)

>>> pinecone_vectordb
<langchain.vectorstores.pinecone.Pinecone object at 0x0000017F1FEE29D0>

>>> query = "what are some bonus features offered by credit cards?"

>>> found_docs = vectordb_pinecone.similarity_search(query,k=3)

>>> found_docs
[Document(page_conten...tadata={}), Document(page_conten...tadata={}), Document(page_conten...tadata={})]
special variables
function variables
0:
Document(page_content='What Are Some of the Bonus Categories for Business Credit Cards?', metadata={})
1:
Document(page_content='What Are Some of the Bonus Categories for Business Credit Cards?', metadata={})
2:
Document(page_content='What Are Some of the Bonus Categories for Business Credit Cards?', metadata={})

I attempted to do a similarity search which I expected should return a series of documents in a hierarchy according to how similar they were to my query.

@carteakey pls have a looks – Yishai Rasowsky Jun 04 '23 at 15:11 — Yishai Rasowsky, Jun 04 '23 at 15:11
@Nick ODell pls have a look – Yishai Rasowsky Jun 04 '23 at 15:11 — Yishai Rasowsky, Jun 04 '23 at 15:11
@andrew_reece pls have a a look – Yishai Rasowsky Jun 04 '23 at 15:12 — Yishai Rasowsky, Jun 04 '23 at 15:12
@Langchain15 pls have a look – Yishai Rasowsky Jun 04 '23 at 15:12 — Yishai Rasowsky, Jun 04 '23 at 15:12
@Francisco Ghelfi pls have a look – Yishai Rasowsky Jun 04 '23 at 15:12 — Yishai Rasowsky, Jun 04 '23 at 15:12
@user791793 pls have a look – Yishai Rasowsky Jun 04 '23 at 15:12 — Yishai Rasowsky, Jun 04 '23 at 15:12

score 0 · Answer 1 · answered Jul 07 '23 at 08:52

There is probably an issue with your document creation. I faced the same issue, and Ive explained the solution here: https://github.com/hwchase17/langchain/pull/7332

Basically, you should avoid using Document(page_content=chunk, metadata=source.metadata) or something similar, if you have used. This is because, the source.metadata here is a dict, which does not create a copy of itself on assignment. Instead, you can use something like Document(page_content=chunk, metadata=source.metadata.copy())

Why does Pinecone repeatedly return the same result from my series of documents?

1 Answers1