0

I have the need to read structured data (pdfs and excel spreadsheets) to a central location. From where other files can read data, to produce a constant and consistent output.

The first challenge is to read data from the input files, the second is to read recognized data into a central location feeding into output files.

Any insight to OCR in python, or alternative text and variable recognition algorithms would be highly appreciated.

Erik
  • 33
  • 6
  • A good overview of what's possible here if you want to read pdf layers: https://medium.com/analytics-vidhya/python-packages-for-pdf-data-extraction-d14ec30f0ad0 If you want OCR, then pyTesseract: https://www.geeksforgeeks.org/python-reading-contents-of-pdf-using-ocr-optical-character-recognition/ – Amiga500 Feb 06 '23 at 09:08

0 Answers0