How to extract structured data from a PDF document using Langchain, and use this data as input to ChatGPT

Question

I'm working on a project where I need to extract data from a PDF document and use that extracted data as input for ChatGPT. I came across Langchain, a language extraction library.

Specifically, I would like to know how to:

Extract text or structured data from a PDF document using Langchain. Transform the extracted data into a format that can be passed as input to ChatGPT. Integrate the extracted data with ChatGPT to generate responses based on the provided information. Any guidance, code examples, or resources would be greatly appreciated. Thank you!

I've been using the Langchain library, UnstructuredFileLoader from langchain.document_loaders to successfully extract data from a PDF document.

Now, I'm attempting to use the extracted data as input for ChatGPT by utilizing the OpenAIEmbeddings. However, I'm encountering an issue where ChatGPT does not seem to respond correctly to the provided data.

I would like to seek advice and suggestions on how to address this problem.

I appreciate any insights, code snippets, or resources that can help me resolve this issue and improve the integration between Langchain and ChatGPT. Thank you in advance for your assistance!

score 0 · Answer 1 · answered Jul 20 '23 at 12:12

https://techcommunity.microsoft.com/t5/startups-at-microsoft/build-a-chatbot-to-query-your-documentation-using-langchain-and/ba-p/3833134

Please look for this startup.

The flow of app.py works something like:

FAISS index is loaded into RAM User asks a question User's question is sent to the OpenAI Embeddings API, which returns a 1536 dimensional vector. The FAISS index is queried for the closest matching vector. The closest matching vector is returned, along with the text that it was generated from. The returned text is fed into GPT-35 as context in a GPT-35 prompt GPT-35 generates a response, which is returned to the user.

How to extract structured data from a PDF document using Langchain, and use this data as input to ChatGPT

1 Answers1