how to extract text from pdf and dumping the information into a database using python

Question

how to extract text from pdf and dumping the information into a database using python? How do I install PyPDF2?

i tried doing it but it shows the following problem

Please add sufficient details for others to understand the question. Also, please mention the exception and issue you're facing. — Abhishek Tyagi, Mar 18 '19 at 13:31

M.K · Accepted Answer · 2019-03-18T11:27:32.753

0

I see you are in Windows, so this is how you install it in Windows! You first need to install properly the setup.py files.

cd C:\Users\User\Downloads\pyPDF2 to go into the directory where the setup.py
C:\python27\python.exe setup.py install I use Python2.7 here. Use C:\python33\python.exe setup.py install for python 3.3 and so on.

That's the fastest way to do it (check the source!!)

On regards of how to extract it, there are lots of tutorials. You should follow the official doc and trustworthy websites! Here is an example!

from PyPDF2 import PdfFileReader
def text_extractor(path):
    with open(path, 'rb') as f:
        pdf = PdfFileReader(f)
        # get the first page
        page = pdf.getPage(1)
        print(page)
        print('Page type: {}'.format(str(type(page))))
        text = page.extractText()
        print(text)
if __name__ == '__main__':
    path = 'reportlab-sample.pdf'
    text_extractor(path)

edited Mar 18 '19 at 11:27

answered Mar 18 '19 at 11:05

M.K

1,464
2
24
46

how do i install pypdf2? – Ayushi Garg Mar 18 '19 at 11:24
I am not sure you read my answer. At least not the first half. There, I give you a resource and tell you how to do it! @AyushiGarg – M.K Mar 18 '19 at 11:26
Reading https://pypi.org/simple/PyPDF2/ Download error on https://pypi.org/simple/PyPDF2/: timed out -- Some packages may not be found! Couldn't find index page for 'PyPDF2' (maybe misspelled?) Scanning index of all packages (this may take a while) Reading https://pypi.org/simple/ Download error on https://pypi.org/simple/: timed out -- Some packages may not be found! No local packages or working download links found for PyPDF2 error: Could not find suitable distribution for Requirement.parse('PyPDF2')--not happening – Ayushi Garg Mar 18 '19 at 11:38
Did you try the other way of installing it (showed in the link I provided you)? @AyushiGarg – M.K Mar 18 '19 at 11:45

how to extract text from pdf and dumping the information into a database using python

1 Answers1