14

I have to convert a .pdf file containing scanned images into .txt files. The tesseract ocr converts only images to .txt, but I need to first extract the .tif images and then convert it. Can anyone help me with this?

Aage
  • 5,932
  • 2
  • 32
  • 57
Ganesh Nannaware
  • 307
  • 1
  • 3
  • 11

1 Answers1

22

Use Imagemagick:

convert -density 600 input.pdf output.tif

Density is in DPI, from my experience 600 DPI works the best.

Karol S
  • 9,028
  • 2
  • 32
  • 45