how to extract text from pdf and dumping the information into a database using python? How do I install PyPDF2?
i tried doing it but it shows the following problem
how to extract text from pdf and dumping the information into a database using python? How do I install PyPDF2?
i tried doing it but it shows the following problem
I see you are in Windows, so this is how you install it in Windows! You first need to install properly the setup.py files.
That's the fastest way to do it (check the source!!)
On regards of how to extract it, there are lots of tutorials. You should follow the official doc and trustworthy websites! Here is an example!
from PyPDF2 import PdfFileReader
def text_extractor(path):
with open(path, 'rb') as f:
pdf = PdfFileReader(f)
# get the first page
page = pdf.getPage(1)
print(page)
print('Page type: {}'.format(str(type(page))))
text = page.extractText()
print(text)
if __name__ == '__main__':
path = 'reportlab-sample.pdf'
text_extractor(path)