I want to do text segmentation on a printed document. I already segment the document to the character segmentation but i failed when i meet some touching character. I want to use the Tesseract OCR only to segment the word. I know Tesseract can do this task, but i dont know how to access that without digging the internal code of tesseract. Can anyone give some advice for me? If it is possible, i need that in Python.
Asked
Active
Viewed 2,024 times
1 Answers
2
If you can call TessBaseAPIGetComponentImages
API method, you can retrieve the segmentation at various pageIteratorLevel
levels (Symbol/Character, Word, Line, etc.) without performing actual OCR on the image.

nguyenq
- 8,212
- 1
- 16
- 16
-
1Can you describe how this can be done using python as in pytesseract, textract , pyocr? – aspiring1 Sep 09 '19 at 05:21