Using Tesseract OCR for Character Segmentation Only

Question

I want to do text segmentation on a printed document. I already segment the document to the character segmentation but i failed when i meet some touching character. I want to use the Tesseract OCR only to segment the word. I know Tesseract can do this task, but i dont know how to access that without digging the internal code of tesseract. Can anyone give some advice for me? If it is possible, i need that in Python.

score 2 · Accepted Answer · answered Apr 13 '17 at 14:45

2

If you can call TessBaseAPIGetComponentImages API method, you can retrieve the segmentation at various pageIteratorLevel levels (Symbol/Character, Word, Line, etc.) without performing actual OCR on the image.

answered Apr 13 '17 at 14:45

nguyenq

8,212
1
16
16

1

Can you describe how this can be done using python as in pytesseract, textract , pyocr? – aspiring1 Sep 09 '19 at 05:21

Using Tesseract OCR for Character Segmentation Only

1 Answers1