0

I have a large batch of PDFs that I can't OCR because they've each got a small field of renderable text.

I'm trying to convert them all to TIFF so I can convert back and run OCR, but I'm running into problems invoking the programs that I'd expect to do the job. I installed them without issue, but for some reason, I keep getting errors saying the associated commands don't exist:

c:\Program Files\Python37\Lib\site-packages>pip install tesseract
Requirement already satisfied: tesseract in c:\program files\python37\lib\site-packages (0.1.3)

c:\Program Files\Python37\Lib\site-packages>tesseract --version
'tesseract' is not recognized as an internal or external command,
operable program or batch file.

c:\Program Files\Python37\Lib\site-packages>pip install ghostscript
Requirement already satisfied: ghostscript in c:\program files\python37\lib\site-packages (0.6)
Requirement already satisfied: setuptools in c:\program files\python37\lib\site-packages (from ghostscript) (40.8.0)

c:\Program Files\Python37\Lib\site-packages>gs --version
'gs' is not recognized as an internal or external command,
operable program or batch file.

c:\Program Files\Python37\Lib\site-packages>gswin32c --version
'gswin32c' is not recognized as an internal or external command,
operable program or batch file.

Any ideas what I'm doing wrong?

Bonus points if you've got a better way to perform the overall task.

bdb484
  • 161
  • 1
  • 11

1 Answers1

2

I notice you are using Windows, I would guess that you haev not added the Ghostscript install directory to the $PATH environment variable, and so Windows does not know where to look to find the executable.

It may be that Python can use the Ghostscript executable from the python37\lib\site-packages directory, but Windows won't know that unless its been told to look there. It'll probably be a sub-directory, unless the Python package installer uses something other than the normal Ghostscript Windows installer.

Note that on Windows the binary is not called 'gs'; it will be either gswin32, gswin64, gswin32c or gswin64c depending on whether you installed the 32 or 64-bit version of Ghostscript, and whether you want the command line (c) or windowed version.

Probably the easiest way to find it is to look in the specified Python folder and see.

KenS
  • 30,202
  • 3
  • 34
  • 51
  • pip is an installer for Python packages. Neither Ghostscript nor Tesseract are Python packages. I would *guess* that what you've installed with pip would be the Python packages that interface with Ghostscript and Tesseract. I would guess you also have to download the actually installers for both from https://github.com/ArtifexSoftware/ghostpdl-downloads/releases and https://github.com/UB-Mannheim/tesseract/wiki – chrisl Sep 16 '19 at 16:04
  • 1
    This is great, thank you! "Note that on Windows the binary is not called 'gs'; it will be either gswin32, gswin64, gswin32c or gswin64c depending on whether you installed the 32 or 64-bit version of Ghostscript, and whether you want the command line (c) or windowed version." – Colton Campbell Mar 02 '23 at 02:47
  • Changing gs -v to gswin32 -v worked for me – Joeboulton Jul 27 '23 at 13:16