0

Context

I have a set of articles which have been vectorised and loaded into Weaviate using "vectorizer": "text2vec-openai"

I then query as follows:

  1. Find articles which are close matches to a question client.query.get("Articles").with_near_text(...)
  2. Iterate through each matching result and fetch the full article text
  3. Use OpenAI Completions to ask a question about the matching documents, e.g. openai.ChatCompletion.create(...)

When I construct the messages to provide to the completion API I inject the full content of the matching articles.

This has at least 3 undesirable impacts:

  1. It's more tokens, which increases cost.
  2. Sometimes it's so many tokens that it exceeds the maximum permitted for the given model, causing an error.
  3. It makes the model slower causing increased latency.

The content of these articles can be long (many pages), and in some cases, only a few sentences or paragraphs are relevant to the question.

In my current approach, I'd send the entire document that Weaviate has deemed to be relevant enough to my input query and configured distance.

Question

Can I query Weaviate in such a way that it can provide me with the excerpts of the most relevant matching text so that I could append only this subset of the raw article content to the OpenAI Completion API.

David
  • 7,652
  • 21
  • 60
  • 98

1 Answers1

1

The content of these articles can be long (many pages), and in some cases, only a few sentences or paragraphs are relevant to the question.

You should chunk the articles e.g. by paragraphs first before sending it to weaviate for vectorizing.

hsm207
  • 471
  • 2
  • 4
  • Thanks for your response. I can see this working. Can this lead to new issues, e.g. if two paragraphs in a single article are somehow related contextually. By breaking these apart could the cosine "similarity" of an input vector vs now 2 distinct vectors stored in weaviate compared to when they were one larger vector? – David Jun 23 '23 at 16:43
  • 1
    It really depends on the dataset. You should experiment with different chunking strategies e.g. chunk by paragraph with n tokens overlap before and after the current paragraph and evaluate the impact on the quality of the retrieval. – hsm207 Jun 23 '23 at 22:23