I understand that I can ask Tesseract to return text back at word level, textline level, paragraph level, block level.
I need to form my own cluster of words, which may be a portion of a text line or include multiple lines. Once I have this cluster of words, I'd like to organize them from left-to-right, top-to-bottom for readability.
I assume Tesseract has this ability since I can get back textline level words in order or paragraph level with words in the right level. Can I access this method from the tess4j API?
Or can someone point me to the algorithm so I can implement it on my own?
Thanks
Edit Here's an example. Suppose my image has this block of text
John Doe Adam Paul Sara Johnson
Vice President Director of IT Head of Human Resources
jdoe@xyz.com apaul@xyz.com sjohnson@xyz.com
If I ask tess4j for textline level words, then I get 3 lines:
John Doe Adam Paul Sara Johnson
and
Vice President Director of IT Head of Human Resources
and
jdoe@xyz.com apaul@xyz.com sjohnson@xyz.com
Instead what I want is
John Doe
Vice President
jdoe@xyz.com
and
Adam Paul
Director of IT
apaul@xyz.com
and
Sara Johnson
Head of Human Resources
sjohnson@xyz.com