4

I had my python project running local, and it works. I use tesseract from python with the subprocess package.

Then I deployed my project and since I use Flask, I installed tiangolo-uwsgi-flask-nginx-docker but, Tesseract isn't installed there. That's why my project doesn't work anymore because it cannot find tesseract. And it doesn't recognize the tesseract that is installed on my AWS instance because tesseract isn't installed in the docker container.

That's why I would like to use also tesseract 4 Docker which has an installation of Tesseract.

I have both Dockers:

c82b61361992        tesseractshadow/tesseract4re:latest   "/bin/bash"            6 seconds ago       Up 5 seconds                                      t4re
e122633ef81c        my_project:latest                 "/entrypoint.sh /sta   35 minutes ago      Up 35 minutes       0.0.0.0:80->80/tcp, 443/tcp   modest_perlman

But I don't know how to tell my_projectthat it has to take Tesseract from the Tesseract Container.

I read this post about connecting two Docker containers, but I get even more lost. :)

I saw that the Tesseract Docker should work this way:

#!/bin/bash
docker ps -f name=t4re
TASK_TMP_DIR=TASK_$$_$(date +"%N")
echo "====== TASK $TASK_TMP_DIR started ======"
docker exec -it t4re mkdir \-p ./$TASK_TMP_DIR/
docker cp ./ocr-files/phototest.tif t4re:/home/work/$TASK_TMP_DIR/
docker exec -it t4re /bin/bash -c "mkdir -p ./$TASK_TMP_DIR/out/; cd ./$TASK_TMP_DIR/out/; tesseract ../phototest.tif phototest -l eng --psm 1 --oem 2 txt pdf hocr"
mkdir -p ./ocr-files/output/$TASK_TMP_DIR/
docker cp t4re:/home/work/$TASK_TMP_DIR/out/ ./ocr-files/output/$TASK_TMP_DIR/
docker exec -it t4re rm \-r ./$TASK_TMP_DIR/
docker exec -it t4re ls
echo "====== Result files was copied to ./ocr-files/output/$TASK_TMP_DIR/ ======"

But I've no clue, how to implement it in my python script and from the other container.

My python-tesseract script looks quite similar to pytesseract.py I just changed a few lines and deleted some stuff I don't need.

Maybe someone knows how to do this, or could propose another better way to use tesseract with the tiangolo-docker

Chuck Aguilar
  • 1,998
  • 1
  • 27
  • 50

1 Answers1

2

EDIT (See the edit below)

I found the answer. Since it would work for every two docker containers, I'm gonna write a general solution which one can always use.

I have both docker images and containers in the same instance:

CONTAINER ID        IMAGE                 COMMAND             CREATED             STATUS              PORTS                    NAMES
14524d364cff        (image)               "java -jar ..."   40 hours ago        Up 40 hours         0.0.0.0:5000->5000/tcp   api-1
3392994ae3ac        (image)               "java -jar ..."   40 hours ago        Up 40 hours         0.0.0.0:5002->5002/tcp   api-2

Until here it's easy.

Then, I wrote a docker-compose.yml

version: '2'
services:         
  api-1:
    image: _name-of-image_
    container_name: api-1
    ports:
      - "5000:5000"
    depends_on:
      - api-2

  api-2:
    image: _name-of-image_
    container_name: api-2
    ports:
      - "5002:5002"

Then, in the docker file of api-1, for example.

...
ENV API-2HOST api-2
...

and that's it.

In my particular case, I have an api-1.conf with:

accounts = {
  http = {
    host = "localhost"
    host = ${?API-2HOST}
    port = 5002
    poolBufferSize = 100
    routes = {
      authentication = "/authentication"
      login = "/login/"
      logout = "/logout"
      refreshTokens = "/refreshTokens"
    }
  }
}

and then I can easily make a request there and so are both docker containers communicated.

Hope it can help someone.

EDIT

Since it can be complicated, I created a git project with just a dockerfile where you can use flask, nginx, uwsgi and tesseract. So there's no need to use both containers.

docker-flask-nginx-uwsgi-tesseract

Chuck Aguilar
  • 1,998
  • 1
  • 27
  • 50