I have the need to read structured data (pdfs and excel spreadsheets) to a central location. From where other files can read data, to produce a constant and consistent output.
The first challenge is to read data from the input files, the second is to read recognized data into a central location feeding into output files.
Any insight to OCR in python, or alternative text and variable recognition algorithms would be highly appreciated.