3

I have my python project with tesseract running locally, and it works in Pycharm. I used docker-compose.yml, having two containers (app and t4re) as follows:

version: '3'
services:
  app:
    build: .
    image: ocr_app:latest
    depends_on:
      - tesseract
  tesseract:
    image: tesseractshadow/tesseract4re
    container_name: t4re

and my Dockerfile is as follows:

FROM python:3.6.1
# Create app directory
WORKDIR /app

# Bundle app source
COPY venv/src ./src
COPY venv/data ./data

# Install app dependencies
RUN pip install -r src/requirements.txt

CMD python src/ocr.py

and I keep getting these errors:

FileNotFoundError: [Errno 2] No such file or directory: 'tesseract'

pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path

I am new to docker and read tons of documents, but I still cannot manage to fix this error. I've read the following answers. I guess I have to link tesseract to the python app with an environment variable, but I do not know how.

Use Tesseract 4 - Docker Container from uwsgi-nginx-flask-docker

TesseractNotFoundError: tesseract is not installed or it's not in your path

s.tafazzoli
  • 41
  • 1
  • 7
  • 1
    To use tesseract, you should install pytesseract (I guess you already did that via requirements.txt) and you have to install tesseract-ocr via your dockerfile – singrium Jan 20 '20 at 10:07
  • Yes, I installed pytesseract via the requirements.txt. I thought I can use tesseractshadow/tesseract4re image instead of installing it in ubuntu. – s.tafazzoli Jan 21 '20 at 10:56
  • Well, I am not sure if that would work, but the method I described works for me. – singrium Jan 21 '20 at 11:03
  • Did you find a solution to include tesseract in your dockerfile yet? I am facing a similar issue as to containerize a local file that needs tesseract into docker. @s.tafazzoli – liamsuma Jul 30 '20 at 19:58
  • @liamsuma: yes, both answers here work and I used one of them for my project. – s.tafazzoli Aug 01 '20 at 03:21
  • @s.tafazzoli thanks for your response. I tried one of the solutions and wonder how you incorporate `tesseract cmd` in your dockerfile. What I meant was did you change your `tesseract cmd` at all when running container? I got the same **tesseractnotfound** message after successfully installed tesseract lib. – liamsuma Aug 03 '20 at 20:16
  • @liamsuma I used pytesseract lib in my Python code. I did not call the tesseract directly from the cmd. My project (code & Dockerfile) is available on my github account. I hope it helps you. – s.tafazzoli Aug 05 '20 at 05:45
  • @s.tafazzoli its actually my mistake to include local path into the container. It works after I commented out the local path. thanks for your help – liamsuma Aug 05 '20 at 13:45

2 Answers2

3

You need to install tesseract in your docker image before using it. By default python:3.6.1 image does not have tesseract in it. You need to take ubuntu base image install tesseract and python in it then continue your work. Here is the docker file for the solution:

FROM ubuntu:18.04
RUN apt-get --fix-missing update && apt-get --fix-broken install && apt-get install -y poppler-utils && apt-get install -y tesseract-ocr && \
    apt-get install -y libtesseract-dev && apt-get install -y libleptonica-dev && ldconfig && apt-get install -y python3.6 && \
    apt-get install -y python3-pip && apt install -y libsm6 libxext6

Please adjust the python version as per your requirement.

Mousam Singh
  • 675
  • 2
  • 9
  • 29
2

I had this issue on one of my projects that runs on Docker (a Ubuntu container).
To solve that, I had to:
- install pytesseract via requirements.txt; so it your requirements.txt should contain:

pytesseract  

- you have to install tesseract-ocr. To do that, you have to include the following lines in your dockerfile:

FROM ubuntu:18.04

ENV PYTHONUNBUFFERED 1
RUN apt-get update && apt-get install -y software-properties-common && add-apt-repository -y ppa:alex-p/tesseract-ocr
RUN apt-get update && apt-get install -y tesseract-ocr-all 
RUN apt-get install -y python3-pip python3-minimal libsm6 libxext6 
# To make sure that tesseract-ocr is installed, uncomment the following line.  
# RUN tesseract --version
singrium
  • 2,746
  • 5
  • 32
  • 45