How to avoid reinstalling dependencies for each job in Gitlab CI

Question

I'm using Gitlab CI 8.0 with gitlab-ci-multi-runner 0.6.0. I have a .gitlab-ci.yml file similar to the following:

before_script:
  - npm install

server_tests:
  script: mocha

client_tests:
  script: karma start karma.conf.js

This works but it means the dependencies are installed independently before each test job. For a large project with many dependencies this adds a considerable overhead.

In Jenkins I would use one job to install dependencies then TAR them up and create a build artefact which is then copied to downstream jobs. Would something similar work with Gitlab CI? Is there a recommended approach?

I customized my own docker image with what I need. Works for you? — rafa.ferreira, Sep 13 '16 at 06:44

Tamlyn · Answer 1 · 2017-11-16T17:23:10.933

Update: I now recommend using artifacts with a short expire_in. This is superior to cache because it only has to write the artifact once per pipeline whereas the cache is updated after every job. Also the cache is per runner so if you run your jobs in parallel on multiple runners it's not guaranteed to be populated, unlike artifacts which are stored centrally.

Gitlab CI 8.2 adds runner caching which lets you reuse files between builds. However I've found this to be very slow.

Instead I've implemented my own caching system using a bit of shell scripting:

before_script:
  # unique hash of required dependencies
  - PACKAGE_HASH=($(md5sum package.json))
  # path to cache file
  - DEPS_CACHE=/tmp/dependencies_${PACKAGE_HASH}.tar.gz
  # Check if cache file exists and if not, create it
  - if [ -f $DEPS_CACHE ];
    then
      tar zxf $DEPS_CACHE;
    else
      npm install --quiet;
      tar zcf - ./node_modules > $DEPS_CACHE;
    fi

This will run before every job in your .gitlab-ci.yml and only install your dependencies if package.json has changed or the cache file is missing (e.g. first run, or file was manually deleted). Note that if you have several runners on different servers, they will each have their own cache file.

You may want to clear out the cache file on a regular basis in order to get the latest dependencies. We do this with the following cron entry:

@daily               find /tmp/dependencies_* -mtime +1 -type f -delete

I'm using a different approach, with a ln -s command on backup directory before_script to node_modules and an rm node_modules in after_script. This is much faster than a gitlab artifact or a zip. + using gitlab environment on_stop you can now delete backup directory when branch is deleted. — BlouBlou, Jan 25 '17 at 07:12
How does this work if you bump the node version from 6 to 8 for example? I'm guessing this will fail. If you have engines set accordingly in package.json, it will however work. — basickarl, Jun 28 '17 at 10:03
zipping the node_modules folder is still faster than using artifacts or caching. Artifacts uploads the entire contents to GitLab. Caching is still slow. GitRunner seems creep on "Reinitialized existing Git repository" at times taking up to 30 seconds on a small project. — manit, Jun 30 '21 at 07:26

brendo · Answer 2 · 2021-02-23T04:57:59.760

10

EDIT: This solution was recommended in 2016. In 2021, you might consider the caching docs instead.

A better approach these days is to make use of artifacts.

In the following example, the node_modules/ directory is immediately available to the lint job once the build stage has completed successfully.

build:
  stage: build
  script:
    - npm install -q
    - npm run build
  artifacts:
    paths:
      - node_modules/
  expire_in: 1 week

lint:
  stage: test
  script:
    - npm run lint

edited Feb 23 '21 at 04:57

answered Sep 27 '16 at 07:35

brendo

2,914
1
19
22

The artifacts show the download options on the pipeline page. Can we avoid it? – xiang Jan 18 '18 at 05:25
I don't believe so, although this issue may be worth following, https://gitlab.com/gitlab-org/gitlab-ce/issues/29757 – brendo Feb 09 '18 at 21:05
16

DO NOT USE ARTIFACTS FOR CACHING! Use cache: https://docs.gitlab.com/ee/ci/caching/ Thumbs downnn – Melroy van den Berg Oct 21 '19 at 14:23
As mentioned by @danger89, GitLab recommend using artifacts instead of caching for this use case. – Kendall Feb 11 '21 at 04:21
Indeed: ```cache: key: ${CI_COMMIT_REF_SLUG} paths: - .npm/ before_script: - npm ci --cache .npm --prefer-offline ``` Ps. Also avoid caching node_modules folder, instead cache `.npm` folder. And prefer offline cache. – Melroy van den Berg Feb 18 '21 at 15:51
expire_in needs one more indentation – Saeed Gholamzadeh Jun 15 '22 at 09:38

score 9 · Answer 3 · edited Jun 20 '20 at 09:12

From docs:

cache: Use for temporary storage for project dependencies. Not useful for keeping intermediate build results, like jar or apk files. Cache was designed to be used to speed up invocations of subsequent runs of a given job, by keeping things like dependencies (e.g., npm packages, Go vendor packages, etc.) so they don’t have to be re-fetched from the public internet. While the cache can be abused to pass intermediate build results between stages, there may be cases where artifacts are a better fit.

artifacts: Use for stage results that will be passed between stages. Artifacts were designed to upload some compiled/generated bits of the build, and they can be fetched by any number of concurrent Runners. They are guaranteed to be available and are there to pass data between jobs. They are also exposed to be downloaded from the UI. Artifacts can only exist in directories relative to the build directory and specifying paths which don’t comply to this rule trigger an unintuitive and illogical error message (an enhancement is discussed at https://gitlab.com/gitlab-org/gitlab-ce/issues/15530 ). Artifacts need to be uploaded to the GitLab instance (not only the GitLab runner) before the next stage job(s) can start, so you need to evaluate carefully whether your bandwidth allows you to profit from parallelization with stages and shared artifacts before investing time in changes to the setup.

So, I use cache. When don't need to update de cache (eg. build folder in a test job), I use policy: pull (see here).

score 3 · Answer 4 · answered Aug 14 '18 at 20:16

I prefer use cache because removes files when pipeline finished.

Example

image: node

stages:
 - install
 - test
 - compile

cache:
 key: modules
 paths:
  - node_modules/

install:modules:
 stage: install
 cache:
  key: modules
  paths:
    - node_modules/
  after_script:
   - node -v && npm -v
  script:
  - npm i

test:
 stage: test
 cache:
   key: modules
   paths:
     - node_modules/
   policy: pull
 before_script:
  - node -v && npm -v
 script:
- npm run test

compile:
 stage: compile
 cache:
 key: modules
 paths:
   - node_modules/
 policy: pull
 script:
  - npm run build

score 1 · Answer 5 · answered Jun 08 '21 at 02:46

Solved a problem with a symbolic link to a folder outside the working directory. The solution looks like this:

//.gitlab-ci.yml
before_script:
  - New-Item -ItemType SymbolicLink -Path ".\node_modules" -Target "C:\GitLab-Runner\cache\node_modules"
  - yarn

after_script:
  - (Get-Item ".\node_modules").Delete()

I know this is a enough dirty solution but it saves a lot of time for build process and extends the storage life.

score 0 · Answer 6 · answered Nov 04 '15 at 16:10

0

I think it´s not recommended because all jobs of the same stage could be executed in parallel.

First all jobs of build are executed in parallel.
If all jobs of build succeeds, the test jobs are executed in parallel.
If all jobs of test succeeds, the deploy jobs are executed in parallel.
If all jobs of deploy succeeds, the commit is marked as success.
If any of the previous jobs fails, the commit is marked as failed and no jobs of further stage are executed.

I have read that here:

http://doc.gitlab.com/ci/yaml/README.html

answered Nov 04 '15 at 16:10

Andres Rojano Ruiz

1,009
1
11
17

3

Yes but couldn't you have one `build` stage job that installs the dependencies then any number of `test` stage jobs that use those same files? – Tamlyn Nov 05 '15 at 10:46
In that case, I suppose you can do it, but I don´t know if you will find some problem with installed dependencies before. An option could be to define a bash script and run this bash in your test (- sh script.sh) and then you can manage installations inside the bash. – Andres Rojano Ruiz Nov 05 '15 at 10:59

Christophe Weis · Answer 7 · 2021-12-20T13:21:53.823

GitLab introduced caching to avoid redownloading dependencies for each job.

The following Node.js example is inspired from the caching documentation.

image: node:latest

# Cache modules in between jobs
cache:
  key: $CI_COMMIT_REF_SLUG
  paths:
    - .npm/

before_script:
  - npm ci --cache .npm --prefer-offline

server_tests:
  script: mocha

client_tests:
  script: karma start karma.conf.js

Note that the example uses npm ci. This command is like npm install, but designed to be used in automated environments. You can read more about npm ci in the documentation and the command line arguments you can pass.

For further information, check Caching in GitLab CI/CD and the cache keyword reference.

pushing/pulling cache takes a lot of time for large projects. — Denis, Mar 01 '22 at 17:03

How to avoid reinstalling dependencies for each job in Gitlab CI

7 Answers7