I am using Tesseract for text recognition.
How can I simply recognize padding between text and create e.g. pdf or .doc file with the same padding?
Let's say that the source page contains 3 columns with some text (like a news paper). How can I recognize this text with appropriate padding and margin to each other and to page?
Maybe you can suggest example or library that does the same or just algorithm?