I am trying to build a webapp using StreamLit for reading documents (mainly pdf) and load the data using langchain.document_loaders.PyPDFLoader
but I am ending up with an error as follows:
TypeError: stat: path should be string, bytes, os.PathLike or integer, not list
followed by :
File "/opt/homebrew/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
exec(code, module.__dict__)
File "/Users/shuhulhandoo/MetaGeeks/PDF-URL_QA/app.py", line 133, in <module>
main()
File "/Users/shuhulhandoo/MetaGeeks/PDF-URL_QA/app.py", line 75, in main
loader = PyPDFLoader(pdf)
^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/langchain/document_loaders/pdf.py", line 92, in __init__
super().__init__(file_path)
File "/opt/homebrew/lib/python3.11/site-packages/langchain/document_loaders/pdf.py", line 42, in __init__
if not os.path.isfile(self.file_path) and self._is_valid_url(self.file_path):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen genericpath>", line 30, in isfile
In my code, I am actually uploading document (in streamlit) using:
import streamlit as st
from langchain.document_loaders import PyPDFLoader
uploaded_file = st.file_uploader("Upload PDF", type="pdf")
if uploader_file is not None:
loader = PyPDFLoader(uploaded_file)
I am trying to use PyPDFLoader
because I need the source of the documents such as page numbers to be saved up.
I tried adding the texts of each page in the pdf document page-wise as follows:
from PyPDF2 import PdfReader
import streamlit as st
uploaded_file = st.file_uploader("Upload PDF", type="pdf")
if uploaded_file is not None:
texts = ""
reader = PdfReader(uploaded_file)
for page in reader.pages:
texts += page.extract_text()
But in this case, I have lost the information of the page number which I need in my case.