How to extract text from a pdf file

Question

What is the best way to extract text from a pdf file..... I have tried some PyPDF2 -----> it only returns an empty string for all pages tabula -----> it returns a weird table of NaN the pdf I am trying to scrape is http://imdagrimet.gov.in/sites/default/files/daas_bulletin/Vaishali_46.pdf

refer [here](https://www.geeksforgeeks.org/python-reading-contents-of-pdf-using-ocr-optical-character-recognition/) — Shijith, Jan 31 '20 at 06:48
Please check the below link
[https://stackoverflow.com/questions/26494211/extracting-text-from-a-pdf-file-using-pdfminer-in-python](https://stackoverflow.com/questions/26494211/extracting-text-from-a-pdf-file-using-pdfminer-in-python) — Srikar Manthatti, Jan 31 '20 at 07:28

score 0 · Accepted Answer · answered Jan 31 '20 at 06:53

0

You can use textract

import textract

text = textract.process("path_to_pdf")

answered Jan 31 '20 at 06:53

Anirudh Duggal

1 Answers1