0

Without Docker the scripts are able to parse the pdf files using tika.

But however when I'm trying with Docker..I get the following error for the tika server not running: with some reading I tried the following - but the error persists.

Can some please help?

I'm attaching the Dockerfile in the end and listing the docker containers that are running -

  1. docker pull apache/tika
  2. docker run -d -p 9998:9998 apache/tika
  3. cat Dockerfile (listing in the end)
  4. docker build -t docker_parser .
  5. docker run docker_parser

  6. docker ps -a


    CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                     PORTS                    NAMES

    8ff9fd3d0a84        docker_parser       "python ./scripts/..."   2 days ago          Exited (0) 4 minutes ago                            adoring_mestorf

    fdf132926c61        apache/tika         "/bin/sh -c 'java ..."   2 days ago          Up 6 minutes               0.0.0.0:9998->9998/tcp   optimistic_ride
  1. Dockerfile:

    FROM python:3

    RUN pip3 install --upgrade pip requests
    RUN pip3 install python-docx tika numpy pandas

    RUN mkdir scripts
    RUN mkdir pdfs
    RUN mkdir output

    ADD runner.py /scripts/
    ADD header_parser.py /scripts/
    ADD keyword_parser.py /scripts/

    ADD *.pdf /pdfs/

    CMD [ "python", "./scripts/runner.py" ]

8. Error in the code: sentence_parser Oops! Error Type: occured. Details: Unable to start Tika server. Error Type: at line: 156

Space X
  • 97
  • 1
  • 7

1 Answers1

0

Looks like you haven't specified a link between the containers, so tika-python isn't able to connect to port 9998. You could add Java in the docker_parser container and let it host Tika Server, otherwise you'll need to link the containers.

If you want to use the two images, you can either use the --link option on Docker CLI at run time, or build a network (docker network create) and attach the two containers together (docker network connect). I normally use docker-compose to make these kind of things easier and specify the links there.

Dave Meikle
  • 226
  • 2
  • 5