I have to convert a .pdf
file containing scanned images into .txt
files. The tesseract ocr
converts only images to .txt
, but I need to first extract the .tif
images and then convert it. Can anyone help me with this?
Asked
Active
Viewed 1.9k times
14

Aage
- 5,932
- 2
- 32
- 57

Ganesh Nannaware
- 307
- 1
- 3
- 11
1 Answers
22
Use Imagemagick:
convert -density 600 input.pdf output.tif
Density is in DPI, from my experience 600 DPI works the best.

Karol S
- 9,028
- 2
- 32
- 45
-
1Can convert command be used to produce multiple output files? please help me with the usage of it. – Ganesh Nannaware Apr 12 '14 at 07:28
-
3@GaneshNannaware Yes, it can. Put `%04d` in the name of the output file and see how it works. – Karol S Apr 12 '14 at 07:46