I am trying to put together a simple "Q&A with sources" using Langchain and a specific URL as the source data. The URL consists of a single page with quite a lot of information on it.
The problem is that RetrievalQAWithSourcesChain
is only giving me the entire URL back as the source of the results, which is not very useful in this case.
Is there a way to get more detailed source info? Perhaps the heading of the specific section on the page? A clickable URL to the correct section of the page would be even more helpful!
I am slightly unsure whether the generating of the result source
is a function of the language model, URL loader or simply RetrievalQAWithSourcesChain
alone.
I have tried using UnstructuredURLLoader
and SeleniumURLLoader
with the hope that perhaps more detailed reading and input of the data would help - sadly not.
Relevant code excerpt:
llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')
chain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=VectorStore.as_retriever())
result = chain({"question": question})
print(result['answer'])
print("\n Sources : ",result['sources'] )