I am trying to follow this blog in trying to extract text from an invoice pdf file. My text extraction requires extraction specific fields of the invoice.
I have tried pdfminer, textract but they all extract the text as jumbled and its difficult to extract text after that.
I came across Poppler package download below:
https://poppler.freedesktop.org/releases.html
Looks like its a .tar file. And not a python package.
Am not sure how to use this .tar file to extract the package and use it in Python.
Any suggestions how I install this on my mac and then use it programatically in python to run a bunch of pdf files through this to extract data.