I have been battling with Google and the limited documentation of PDFMiner for the last several hours, and although I feel close, I'm just not getting what I need. I've worked through http://www.unixuser.org/~euske/python/pdfminer/ and all three of the YouTube videos to gain a better understanding about PDFs and I'm able to output raw text just fine.
I am working on a script to parse multiple PDF pages. Unfortunately, for this project I am dealing with poor quality PDF files, and the only reliable constant I see is the physical location of text strings being exactly the same. Although I've read hints that text strings can be extracted by physical coords, I have yet to see a working example.
Is there anyone out there who could shed some light on how this is done with PDFMiner? I am open to other modules if there is an obvious better choice, however I need to stick with Python for the script.
Additionally, I have tried PyPdf to no success as well (other than basic text output).
Thanks!