I have written a script that extracts data from pdf. I am using the win32clipboard module to copy the the data into python. Got the logic working on how to get the data I need in each file.
The shortcoming of my process is that I have to open each pdf Ctr-A to Select all then Ctrl-C to get it into clipboard. I then run my script. for reference it is running within Excel using DataNitro.
I have tried PDFMiner, but it seems like it is not being maintained and tend break the text into small bits. The PDF that I am mining contain lots of "small" tables. the copy from clipboard seem to do a pretty descent job of keeping related things together.
Any suggestions on how I can script the opening of PDF selecting all and copying. Basically I am looking for a python way to script the OS. Gut feel is that this is not possible, but maybe somebody knows.