5

I am working on a project that requires me to run pytesseract on a docker container, but am unable to install tesseract onto the container, I also don't know what the file path for pytesseract should be

My Dockerfile:

FROM python:3
ENV PYHTONUNBUFFERED=1
RUN apt-get update && apt-get install -y --no-install-recommends \
      bzip2 \
      g++ \
      git \
      graphviz \
      libgl1-mesa-glx \
      libhdf5-dev \
      openmpi-bin \
      wget \
      python3-tk && \
    rm -rf /var/lib/apt/lists/*
 



WORKDIR /usr/src/app
COPY requirements.txt ./
RUN pip install -r requirements.txt
ENV QT_X11_NO_MITSHM=1

My pytesseract code:

path_to_tesseract = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
pytesseract.tesseract_cmd = path_to_tesseract

            img=cv2.imread(fpath)
            img=cv2.resize(img,None,fx=2,fy=2, interpolation=cv2.INTER_CUBIC)
            text=pytesseract.image_to_string(img)
Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
s_h
  • 51
  • 1
  • 2

1 Answers1

5

I see you are also using opencv. The folowing dependency are required to use pytesseract:

FROM python:3.10-slim

ENV PYHTONUNBUFFERED=1
RUN apt-get update \
  && apt-get -y install tesseract-ocr \ # required for pytesseract
  && apt-get -y install ffmpeg libsm6 libxext6 # required for opencv

...
RUN pip install -r requirements.txt

But as you are using docker I would recommend to install opencv-python-headless instead of opencv which is mainly intended for headless environments like Docker. It will come with a precompiled binary wheel and reduce the docker image size. The Dockerfile will be reduced to:

FROM python:3.10-slim

ENV PYHTONUNBUFFERED=1
RUN apt-get update \
  && apt-get -y install tesseract-ocr

...
RUN pip install -r requirements.txt
Dhia
  • 10,119
  • 11
  • 58
  • 69
  • Note the legal implications of using `ffmpeg` in closed-source settings (it is licensed under LGPL). The issue also applies to `opencv-python-headless`. – mirekphd May 19 '23 at 19:09