I would like to create a pdf file with text recognition from a scanned image.
But I don't want the original image in the PDF file, just plain text. The text should be visible so that it can be read, but the font doesn't matter that much.
This Tesseract command does almost what I want, but the text is invisible.
tesseract -c textonly_pdf=1 test.tif test pdf
- How can I make the text visible?
- Can I create a pdf file with another command-line or python tool?
I'm running Tesseract in Ubuntu.