4

I have an python application whose docker build takes about 15-20 minutes. Here is how my Dockerfile looks like more or less

FROM ubuntu:18.04
...
COPY . /usr/local/app
RUN pip install -r /usr/local/app/requirements.txt
...
CMD ...

Now if I use skaffold, any code change triggers a rebuild and it is going to do a reinstall of all requirements(as from the COPY step everything else is going to be rebuild) regardless of whether they were already installed. iIn docker-compose this issue would be solved using volumes. In kubernetes, if we use volumes in the following way:

apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- image: test:test
name: test-container
volumeMounts:
- mountPath: /usr/local/venv # this is the directory of the 
# virtualenv of python
    name: test-volume
volumes:
- name: test-volume
  awsElasticBlockStore:
    volumeID: <volume-id>
    fsType: ext4

will this extra requirements build be resolved with skaffold?

Matt
  • 68,711
  • 7
  • 155
  • 158

3 Answers3

4

I can't speak for skaffold specifically but the container image build can be improved. If there is layer caching available then only reinstall the dependencies when your requirements.txt changes. This is documented in the "ADD or COPY" Best Practices.

FROM ubuntu:18.04
...
COPY requirements.txt /usr/local/app/
RUN pip install -r /usr/local/app/requirements.txt
COPY . /usr/local/app
...
CMD ...

You may need to trigger updates some times if the module versions are loosely defined and say you want a new patch version. I've found requirements should be specific so versions don't slide underneath your application without your knowledge/testing.

Kaniko in-cluster builds

For kaniko builds to make use of a cache in a cluster where there is no persistent storage by default, kaniko needs either a persistent volume mounted (--cache-dir) or a container image repo (--cache-repo) with the layers available.

Matt
  • 68,711
  • 7
  • 155
  • 158
3

If your goal is to speed up the dev process: Instead of triggering an entirely new deployment process every time you change a line of code, you can switch to a sync-based dev process to deploy once and then update the files within the running containers when editing code.

Skaffold supports file sync to directly update files inside the deployed containers if you change them on your local machine. However, the docs state "File sync is alpha" (https://skaffold.dev/docs/how-tos/filesync/) and I can completely agree from working with it a while ago: The sync mechanism is only one-directional (no sync from container back to local) and pretty buggy, i.e. it crashes frequently when switching git branches, installing dependencies etc. which can be pretty annoying.

If you want a more stable alternative for sync-based Kubernetes development which is very easy to get started with, take a look at DevSpace: https://github.com/devspace-cloud/devspace

I am one of the maintainers of DevSpace and started the project because Skaffold was much too slow for our team and it did not have a file sync back then.

Lukas Gentele
  • 949
  • 7
  • 13
  • 1
    Yes indeed, `tilt` is another tool I came across that will do the same. I am currently running some tests and POCs to try it out https://docs.tilt.dev/live_update_tutorial.html – Rajdeep Mukherjee Aug 29 '19 at 05:15
  • Yes, tilt is pretty good as well. Especially if you want to work with minikube etc. in a local cluster. – Lukas Gentele Aug 29 '19 at 20:08
1

@Matt's answer is a great best practice (+1) - skaffold in and of itself won't solve the underlying layer cache invalidation issues which results in having to re-install the requirements during every build.

For additional performance, you can cache all the python packages in a volume mounted in your pod for example:

apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- image: test:test
  name: test-container
volumeMounts:
- mountPath: /usr/local/venv
    name: test-volume
- mountPath: /root/.cache/pip
    name: pip-cache
volumes:
- name: test-volume
  awsElasticBlockStore:
    volumeID: <volume-id>
    fsType: ext4
- name: pip-cache
  awsElasticBlockStore:
    volumeID: <volume-id>
    fsType: ext4

That way if the build cache is ever invalidated and you have to re-install your requirements.txt you'd be saving some time by fetching them from cache.

If you're building with kaniko you can also cache base images to a persistent disk using the kaniko-warmer, for example:

...
volumeMounts:
...
- mountPath: /cache
    name: kaniko-warmer
volumes:
...
- name: kaniko-warmer
  awsElasticBlockStore:
    volumeID: <volume-id>
    fsType: ext4

Running the kaniko-warmer inside the pod: docker run --rm -it -v /cache:/cache --entrypoint /kaniko/warmer gcr.io/kaniko-project/warmer --cache-dir=/cache --image=python:3.7-slim --image=nginx:1.17.3. Your skaffold.yaml might look something like:

apiVersion: skaffold/v1beta13
kind: Config
build:
  artifacts:
  - image: test:test
    kaniko:
      buildContext:
        localDir: {}
      cache:
        hostPath: /cache
  cluster:
    namespace: jx
    dockerConfig:
      secretName: jenkins-docker-cfg
  tagPolicy:
    envTemplate:
      template: '{{.DOCKER_REGISTRY}}/{{.IMAGE_NAME}}'
deploy:
  kubectl: {}
masseyb
  • 3,745
  • 1
  • 17
  • 29
  • 1
    If you are doing in-cluster docker builds, assuming kaniko, the same could be done with the [`--cache`](https://github.com/GoogleContainerTools/kaniko#--cache) and [`--cache-dir`](https://github.com/GoogleContainerTools/kaniko#--cache-dir) options – Matt Aug 29 '19 at 23:09
  • Can use the `--cache-dir` for `python` packages? Wasn't aware of that, I use the `kaniko-warmer` to cache base images in a `--cache-dir` (also backed by a persistent disk), will have to test it out. Thanks. – masseyb Aug 30 '19 at 04:47
  • 1
    Sorry I worded my comment badly. You don't use `cache-dir` specifically for python packages but using the same `mountPath` setup for a kaniko cache dir, kaniko can cache container image layers. Once you have the Dockerfile setup from [my answer](https://stackoverflow.com/a/57702674/1318694) the initial `requirements.txt` and `pip install` layers can be pulled from local disk cache when the contents haven't changed. – Matt Aug 30 '19 at 05:02
  • Actually with a multistage build you could have a cached image with the pip cache in it. Then the final image can stay clean, just copying the file paths you want. – Matt Aug 30 '19 at 05:14
  • Ah, yeah, I use `kaniko` / `kaniko-warmer` (`jenkins x` with `tekton` pipelines) for my builds - updated my answer. Didn't think about the multi-stage build but that is a good point, e.g. build an image with just the dependencies following best practices, cache the image for extending them with the `kaniko-warmer`, if you need to rebuild them then cache the dependencies (i.e. pip-cache). Seems legit. – masseyb Aug 30 '19 at 07:03
  • Latest version(s) of `nexus` supports proxying / caching `apt` repositories (as well as many other things i.e. `npm`, `pypi`, `docker`, ...). `squid` can be used as a pull through cache proxy as well. Same for the `docker` registry to avoid having to pull images from the internetz all the time as per the [doc](https://docs.docker.com/registry/recipes/mirror/). – masseyb Aug 30 '19 at 07:08
  • @masseyb I tried your solution with kaniko, but ```skaffold dev``` is failing with the following message: ```listing files: undefined artifact type: {DockerArtifact: BazelArtifact: JibMavenArtifact: JibGradleArtifact: KanikoArtifact:0xc00030bb90 CustomArtifact:}``` Not sure what is going on! – Rajdeep Mukherjee Sep 06 '19 at 13:28