llama_index's query answer returns None for streaming=True

Question

I am trying to understand how the OpenAI streaming works using LlamaIndex. Specifically, looking at this tutorial:

https://gpt-index.readthedocs.io/en/latest/how_to/customization/streaming.html

I'm trying to adapt this other tutorial on 10K analysis here to become a streaming answer, as waiting for the answer can take quite a while for large documents:

https://gpt-index.readthedocs.io/en/latest/examples/usecases/10k_sub_question.html

According to the streaming docs, you need 2 things.

Use an LLM that supports streaming, and set streaming=True.

So in my code, I do this (use OpenAI, set streaming):

llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="text-davinci-003", max_tokens=-1, streaming=True))

Configure query engine to use streaming

I have 2 query engines, one for Uber, one for Lyft. So each one gets streaming:

# rebuild storage context
lyft_storage_context = StorageContext.from_defaults(persist_dir="./indexed_articles/lyft10K.json")
# load index
lyft_engine = load_index_from_storage(lyft_storage_context).as_query_engine(similarity_top_k=3, streaming=True)


# rebuild storage context
uber_storage_context = StorageContext.from_defaults(persist_dir="./indexed_articles/uber10K.json")
# load index
uber_engine = load_index_from_storage(uber_storage_context).as_query_engine(similarity_top_k=3, streaming=True)

Using all this, you can then construct your query engine.

query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(name='lyft_10k', description='Provides information about Lyft financials for year 2021')
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(name='uber_10k', description='Provides information about Uber financials for year 2021')
    ),
]

llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="text-davinci-003", max_tokens=-1, streaming=True))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools,
                                                question_gen=LLMQuestionGenerator.from_defaults(service_context=service_context))

Now, when you run a query, docs say you should get a ResponseGen object. So this should return a ResponseGen:

streaming_response = s_engine.query(
    "Describe the financials of Uber in 2020", 
)

Which they then say you can run a loop over the results:

for text in streaming_response.response_gen:
    # do something with text as they arrive.

However, I always get back a None object from query(), so am unable to go anywhere. What am I doing wrong? Where is my streaming response object?

llama_index's query answer returns None for streaming=True

0 Answers0