5

I am trying to run the following script on a databrick python notebook:

pip install presidio-image-redactor
pip install pytesseract
python -m spacy download en_core_web_lg

from PIL import Image
from presidio_image_redactor import ImageRedactorEngine
import pytesseract

image = Image.open("images/ImageData.PNG")

engine = ImageRedactorEngine()

redacted_image = engine.redact(image, (255, 192, 203))

Upon running the last line, I'm getting the error below:

TesseractNotFoundError: tesseract is not installed or it's not in your PATH.

am I missing anything?

Michelle Santos
  • 257
  • 4
  • 20

1 Answers1

6

You can use %sh in a separate cell to execute the shell commands on the driver node. To install tesseract, you can do:

%sh apt-get -f -y install tesseract-ocr 

If you need to install it to all nodes of the cluster, you need to use cluster init script with the same command (without %sh)

Alex Ott
  • 80,552
  • 8
  • 87
  • 132