Without Docker the scripts are able to parse the pdf files using tika.
But however when I'm trying with Docker..I get the following error for the tika server not running: with some reading I tried the following - but the error persists.
Can some please help?
I'm attaching the Dockerfile in the end and listing the docker containers that are running -
- docker pull apache/tika
- docker run -d -p 9998:9998 apache/tika
- cat Dockerfile (listing in the end)
- docker build -t docker_parser .
docker run docker_parser
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8ff9fd3d0a84 docker_parser "python ./scripts/..." 2 days ago Exited (0) 4 minutes ago adoring_mestorf
fdf132926c61 apache/tika "/bin/sh -c 'java ..." 2 days ago Up 6 minutes 0.0.0.0:9998->9998/tcp optimistic_ride
- Dockerfile:
FROM python:3
RUN pip3 install --upgrade pip requests
RUN pip3 install python-docx tika numpy pandas
RUN mkdir scripts
RUN mkdir pdfs
RUN mkdir output
ADD runner.py /scripts/
ADD header_parser.py /scripts/
ADD keyword_parser.py /scripts/
ADD *.pdf /pdfs/
CMD [ "python", "./scripts/runner.py" ]
8. Error in the code: sentence_parser Oops! Error Type: occured. Details: Unable to start Tika server. Error Type: at line: 156