Gitlab docker executor - cache image after before_script

Question

In gitlab-ci there's an option in the .gitlab-ci.yml file to execute commands before any of the actual script runs, called before_script. .gitlab-ci.yml examples illustrate installing ancillary programs here. However, what I've noticed is that these changes are not cached in Docker when using a docker executor. I had naively assumed that after running these commands, docker would cache the image, so for the next run or test, docker would just load the cached image produced after before_script. This would drastically speed up builds.

As an example, my .gitlab-ci.yml looks a little like:

image: ubuntu

before_script:
    - apt-get update -qq && apt-get install -yqq make ...

build:
    script:
        - cd project && make

A possible solution is to go to the runner machine and create a docker image that can build my software without any other installation and then reference it in the image section of the yaml file. The downside of this is that whenever I want to add a dependency, I need to log in to the runner machine and update the image before builds will succeed. It would be much nicer if I just had to add the dependency to to the end of apt-get install and have docker / gitlab-ci handle the appropriate caching.

There is also a cache command in .gitlab-ci.yml, which I tried setting to untracked: true, which I thought would cache everything that wasn't a byproduct of my project, but it didn't seem to have any effect.

Is there any way to get the behavior I desire?

I wish there was an option like "image:dockerfile" or "image:build", either inline or as a file reference, similar to how docker-compose allows customized images. With such support in the runner, we could even forget about docker-in-docker if the only thing we need is a reproducible build environment. — Daniel Alder, Dec 03 '19 at 01:08

charli · Answer 1 · 2017-02-07T14:24:07.860

12

You can add a stage to build the image in first place. If the image doesn't have any change, the stage will be very short, under 1 second.

You can use that image on the following stages, speeding up the whole process.

This is an example of a .gitlab-ci.yml:

stages:
  - build_test_image
  - test

build_test:
  stage: build_test_image
  script:
    - docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY
    - docker build -t $CI_REGISTRY_IMAGE:test -f dockerfiles/test/Dockerfile .
    - docker push $CI_REGISTRY_IMAGE:test
  tags:
    - docker_build

test_syntax:
  image: $CI_REGISTRY_IMAGE:test
  stage: test
  script:
    - pip install flake8
    - flake8 --ignore=E501,E265 app/

Look at the tag docker_build. That tag is used to force the execution of the stage on the runner which has that tag. The executor for that runner is shell, and it's used only to build Docker images. So, the host where the runner lives should have installed Docker Engine. I found this solution suits better my needs than docker in docker and another solutions.

Also, I'm using a private registry, that's why I'm using $CI_REGISTRY* variables, but you can use DockerHub without need to specify the registry. The problem would be to authenticate on DockerHub, though.

edited Feb 07 '17 at 14:24

answered Dec 14 '16 at 16:16

charli

1,700
1
13
21

Is there any documentation for this functionality? – Envek Feb 07 '17 at 10:57
If I've added my own runner to the hosted GitLab instance, should I add `docker_build` tag to it or GitLab handles it internally and implicitly? – Envek Feb 07 '17 at 11:31
You should add it explicitly, the tag `docker_build` it's just a convenient name that I chosed, but can be any. It's not documented, it's just a way to do it, I figured it out. – charli Feb 07 '17 at 13:59
I'm editing the answer to clarify the usage of the tag. – charli Feb 07 '17 at 14:00
1

Thank you for your reply. Actually there are many voices against shell executor, and I've done image caching manually as in this post: https://gitlab.com/gitlab-org/gitlab-ce/issues/17861#note_21746810 – Envek Feb 07 '17 at 17:47

Suever · Answer 2 · 2016-01-15T22:24:19.687

The way that I handle this is that I have custom images on Docker Hub for each of our projects and reference them from .gitlab-ci.yml. If I need a new dependency, I edit the Dockerfile used to create the initial image, re-build the image, and tag it using a specific tag and push to Docker Hub.

cat "RUN apt-get install gcc" >> Dockerfile
ID=$(docker build)
docker tag $ID ACCOUNT/gitlab_ci_image:gcc
docker push ACCOUNT/gitlab_ci_image

Then I update the .gitlab-ci.yml file to point to that specific version of the image.

image: ACCOUNT/gitlab_ci_image:gcc

build:
    script:
        - cd project && make

This allows me to have different dependencies depending on which commit I am attempting to test (as the gitlab-ci.yml file within that commit tells the runner which to use). It also prevents the need to install the dependencies every time a test is run on a particular runner as the runner will re-use the same image as long as it doesn't change.

The other nice thing, is that with the images hosted on Docker Hub, if the runner needs a specific tag that it doesn't have locally, it will go grab the correct one automatically so you can have 10 runners and only maintain a single image and this maintenance can be done on your own workstation or any machine.

I personally think that this is a much better solution than attempting to cache anything within a runner's image. This is particularly true when you create a new branch to test your code on a newer version of a dependency. If you had caching you would have issues having different testing environments for your stable and dev branches. Also in my opinion, tests should be run within as clean an environment as possible and this setup accomplishes that.

I had thought of this, and there are some upsides, but it seems like it wouldn't be that hard to run each line of `before_script` as a RUN command and then have docker do the caching at that level. — Erik, Jan 15 '16 at 22:32
Yea I think it is definitely possible, but my best guess for the rationale behind it would be near the end of my answer, because if you had different `before_script` directives in different commits things could get a bit messy. Also `before_script` could be used to do all sorts of things apart from installing packages. You could always post on their github page if you're curious. They are really good at responding. What I have posted has served our group well though. — Suever, Jan 15 '16 at 22:35
I am going to work with something like you describe for the time being. — Erik, Jan 15 '16 at 22:54

Gitlab docker executor - cache image after before_script

2 Answers2

Linked