Problem:
I had tesseract
installed in local machine and its path is at /usr/local/Cellar/tesseract/4.1.1/bin/tesseract
. Everything works perfectly until I containerized it in docker with error message as: pytesseract.pytesseract.TesseractNotFoundError: is not installed or it's not your PATH
What I've tried:
Based on the error message, this is what I've tried:
1). Add PATH in docker desktop app under file sharing to /usr/local
and mount the file path from local to docker - still getting the error message (doesn't work)
2). Move tesseract.exe
from where it resides to current local working dir - still getting the error message(of course it doesn't work - what was I even thinking back then?)
3). Modify dockerfile to install tesseract with its dependencies. Here is the dockerfile:
FROM python:3.7-alpine
RUN apk update && apk add --no-cache tesseract-ocr
WORKDIR /app
COPY ./requirements.txt ./
RUN pip3 install --upgrade pip
# install dependencies
RUN pip3 install -r requirements.txt
RUN pip3 install --upgrade PyMuPDF
# bundle app source
COPY . /app
COPY ./ChaseOCR.py /app
COPY ./BancAmericaOCR.py /app
COPY ./WellsFargoOCR.py /app
EXPOSE 8080
CMD ["python3", "MainBankClass.py"]
Under requirements.txt file, pytesseract and tesseract dependencies are also included. - still getting the error message (doesn't work). Being stuck on this issue in the past 2 days and kinda running out of options here. This link and this link both don't work on my case. Any help is much appreciated. Thanks in advance.
EDIT:
Thanks to Neo's solution and I am testing it now but its running very slowly. Thus I thought it would be better to share requirements.txt file here just in case other issues are non-related to tesseract.
requirements.txt:
numpy
pandas
opencv-python
Pillow
Image
pytesseract
tesseract
PyMuPDF
python-levenshtein
tabula-py
Local file dir:
testdockerfile
├─ .vscode
│ └─ settings.json
├─ BankofAmericaOCR.py
├─ ChaseOCR.py
├─ Dockerfile
├─ MainBankClass.py
|- __init__.py
├─ WellsFargoOCR.py
└─ requirements.txt
EDIT 2:
Just for future reference if anyone has the same issue as I did after implementing tesseract
in docker and still getting TesseractNotFound issue. What you need to do is to comment out pytesseract.pytesseract.tesseract_cmd = r'/path/to/your/tesseract
if you set the path to run it locally. After that, you also need to re-build the image and run that image in docker. It should be fine.