My suggestion is: use a Ghostscript commandline. Because ImageMagick uses Ghostscript anyway, in the background (the technical IM term for this is: Ghostscript is a "delegate" for some of the conversions, such as PDF->TIFF).
Here is a commandline that should work well for letter-sized pages of a multi-page PDF file:
gswin32c.exe ^
-o page_%03d.tif ^
-sDEVICE=tiffg4 ^
-r720x720 ^
-g6120x7920 ^
input.pdf
The -g...
parameter controls the absolute width+height of the output pages using 'device points'... (and with 6120x7920 at 720dpi this happens to be letter-sized).
These TIFF pages...
- ...will be black+white,
- ...will have a resolution of 720dpi,
- ...will be G4-compressed and
- ...will be much smaller than your un-compressed 300dpi from the IM commandline
Your IM parameter of -depth 8
isn't suited to give good results from the p.o.v. of later OCR, since it will create shades of gray around letters which don't help with this.
Your OCR results should now be much better than before.
If your OCR can't handle TIFF G4 format (which I doubt), then you could generate other TIFF subformats with the help of Ghostscript. For example:
gswin32c.exe ^
-o page_%03d.tif ^
-sDEVICE=tiffgray ^
-r720x720 ^
-g6120x7920 ^
-sCompression=lzw ^
input.pdf
.
gswin32c.exe ^
-o page_%03d.tif ^
-sDEVICE=tiff24nc ^
-r720x720 ^
-g6120x7920 ^
-sCompression=lzw ^
input.pdf
The tiffgray
device creates 8-bit gray output. The tiff24nc
device creates 8-bit RGB color output. Both types of TIFF will of course be bigger than the tiffg4
output.