3

I have a scraper with the following Dockerfile:

# Adapted from trcook/docker-scrapy
FROM python:alpine
RUN apk --update add libxml2-dev libxslt-dev libffi-dev gcc musl-dev libgcc openssl-dev
COPY . /scraper
RUN pip install -r /scraper/requirements.txt
WORKDIR /scraper/apkmirror_scraper
CMD ["scrapy", "crawl", "apkmirror"]

The code for the scraper is located in /scraper/apkmirror_scraper, and the requirements in scraper/requirements.txt. I've noticed that every time I modify the code and build the image, it re-runs the pip install -r requirements.txt rather than using the local cache.

How can I prevent this and make it use the local cache?

(One 'theory' about this is that whereas /scraper/requirements.txt itself hasn't changed, the /scraper directory has, which makes the RUN directive have to 're-run'; in this case it might help to move requirements.txt to a different directory. I wasn't able to verify whether this 'theory' is correct from https://docs.docker.com/engine/reference/builder/#run, however).

Kurt Peek
  • 52,165
  • 91
  • 301
  • 526

1 Answers1

9

This question, Docker how to run pip requirements.txt only if there was a change?, seems to pertain to my situation. Every time I modify the code I invalidate the Docker build cache, even though requirements.txt is unchanged. So to avoid having to re-run pip installs every time, it is recommended to COPY the requirements.txt and RUN pip install -r requirements.txt in a separate step.

Community
  • 1
  • 1
Kurt Peek
  • 52,165
  • 91
  • 301
  • 526
  • 5
    Not helpful. It is still running pip install if I add a single new library in the requirements.txt. I want it to install only the new library. – etotientz Sep 03 '20 at 10:51