-2

What is the best way to extract text from a pdf file..... I have tried some PyPDF2 -----> it only returns an empty string for all pages tabula -----> it returns a weird table of NaN the pdf I am trying to scrape is http://imdagrimet.gov.in/sites/default/files/daas_bulletin/Vaishali_46.pdf

anny
  • 73
  • 1
  • 9
  • 1
    refer [here](https://www.geeksforgeeks.org/python-reading-contents-of-pdf-using-ocr-optical-character-recognition/) – Shijith Jan 31 '20 at 06:48
  • Please check the below link
    [https://stackoverflow.com/questions/26494211/extracting-text-from-a-pdf-file-using-pdfminer-in-python](https://stackoverflow.com/questions/26494211/extracting-text-from-a-pdf-file-using-pdfminer-in-python)
    – Srikar Manthatti Jan 31 '20 at 07:28

1 Answers1

0

You can use textract

import textract

text = textract.process("path_to_pdf")