-1

So I'm trying something very simple: I just want to read text from a pdf file in to a variable - that's it. This is what I'm getting:

enter image description here

Does anyone know a reliable way to just read pdf in to a text file?

David van Driessche
  • 6,602
  • 2
  • 28
  • 41

1 Answers1

0

Try the following library - pdfplumber:

import pdfplumber
pdf_file = pdfplumber.open('anyfile.pdf')
page = pdf_file.pages[0]
text = page.extract_text()
print(text)
pdf_file.close()

I haven't used PyPDF2 before but pdfplumber seems to work well for me.

Arty
  • 129
  • 8