1

I am trying to return the result of llama_index's index.query(query, streaming=True).

But not sure how to do it.

This one obviously doesn't work.

index = GPTSimpleVectorIndex.load_from_disk(index_file)

return index.query(query, streaming=True)

Error message: TypeError: cannot pickle 'generator' object.

This one neither.

def stream_chat(query: str, index):
    for chunk in index.query(query, streaming=True):
        print(chunk)
        content = chunk["response"]
        if content is not None:
            yield content

# in another function
index = GPTSimpleVectorIndex.load_from_disk(index_file)
return StreamingResponse(stream_chat(query, index), media_type="text/html")

Error message: TypeError: 'StreamingResponse' object is not iterable.

Thanks!

Taishi Kato
  • 370
  • 1
  • 3
  • 11

2 Answers2

3

Ok I figured it out.

The answer is

return StreamingResponse(index.query(query, streaming=True).response_gen)
Taishi Kato
  • 370
  • 1
  • 3
  • 11
  • 1
    If anyone is trying to figure out how to do this with a python Flask server, just wrap the response_gen object in the stream_with_context() Flask helper In your routed method: `return stream_with_context(index.query(query, streaming=True).response_gen)` – jkriddle Apr 26 '23 at 15:10
  • Does anyone have advice on how to implement this in Django? – gmaly Jun 10 '23 at 13:26
0

If you are using flask, use steam_with_context. The final code would be as shown below.

return stream_with_context(index.as_query_engine(streaming=True).query(user_prompt).response_gen)
Jayakrishnan
  • 563
  • 5
  • 13