According to my project of image processing. I need is to integrate hadoop (parallel processor) with tesseract (image processing to txt).
Asked
Active
Viewed 1,440 times
1 Answers
0
You might find OSSOCR useful. It contains a module called python-tesseract for OCR processing. You could use it with Hadoop streaming.

Tariq
- 34,076
- 8
- 57
- 79
-
Hi Tariq, thank you for your wonderful documentation. But, yet i am getting some errors like main_dummy.cpp:7 and publictypes.h errors – Mahesh Muni Jun 25 '13 at 07:24
-
tesseract.i:13: Error: Unable to find 'publictypes.h' – Mahesh Muni Jun 26 '13 at 06:39
-
errors in main_dummy.cpp:ProcessPagesWrapper, tesseract api, tessbaseapi, ProcessPagesFileStream, python2.6: can't open file 'tesseract.py' ImportError: No module named tesseract – Mahesh Muni Jun 26 '13 at 06:41