0

According to my project of image processing. I need is to integrate hadoop (parallel processor) with tesseract (image processing to txt).

Cœur
  • 37,241
  • 25
  • 195
  • 267
Mahesh Muni
  • 51
  • 2
  • 7

1 Answers1

0

You might find OSSOCR useful. It contains a module called python-tesseract for OCR processing. You could use it with Hadoop streaming.

Tariq
  • 34,076
  • 8
  • 57
  • 79
  • Hi Tariq, thank you for your wonderful documentation. But, yet i am getting some errors like main_dummy.cpp:7 and publictypes.h errors – Mahesh Muni Jun 25 '13 at 07:24
  • tesseract.i:13: Error: Unable to find 'publictypes.h' – Mahesh Muni Jun 26 '13 at 06:39
  • errors in main_dummy.cpp:ProcessPagesWrapper, tesseract api, tessbaseapi, ProcessPagesFileStream, python2.6: can't open file 'tesseract.py' ImportError: No module named tesseract – Mahesh Muni Jun 26 '13 at 06:41