0

I've been battling with this problem for hours and can't seem to find a solution. I have a self hosted gitlab-runner that's an Amazon Linux 2 EC2 instance. I installed git, docker and gitlab-runner (and registered it successfully). Here's my .gitlab-ci.yml file:

  - install
  - lint
  - build-nodejs-app
  - test
  - build-docker-image

install_dependencies:
  image: node:15.6.0-alpine
  stage: install
  script:
    - npm install

format:
  image: node:15.6.0-alpine
  stage: lint
  script:
    - npm install --global prettier
    - prettier --write .

lint:
  image: node:15.6.0-alpine
  stage: lint
  script:
    - npm run lint

build:
  image: node:15.6.0-alpine
  stage: build-nodejs-app
  script:
    - npm install
    - npm run build
  artifacts:
    paths:
      - build/

test_index_file:
  image: node:15.6.0-alpine
  stage: test
  script:
    - test -f build/index.html

unit_tests:
  image: node:15.6.0-alpine
  stage: test
  script:
    - npm install
    - npm run test

build-docker-image-aws:
  image: docker:stable
  services:
    - docker:dind
  variables:
    DOCKER_HOST: tcp://docker:2375
    DOCKER_TLS_CERTDIR: ""
  stage: build-docker-image
  before_script:
    - mkdir -p ~/.aws
    - echo $AWS_ACCESS_KEY_ID > ~/.aws/credentials
    - echo $AWS_SECRET_ACCESS_KEY >> ~/.aws/credentials
  script:
    - docker info
    - docker login -u $DOCKERHUB_USERNAME -p $DOCKERHUB_PASSWORD
    - cp -R build/ app/
    - docker build -t $DOCKER_IMAGE_NAME .
    - docker push $DOCKER_IMAGE_NAME
    - docker run --rm -v ~/.aws:/root/.aws amazon/aws-cli ecs update-service --cluster $ECS_CLUSTER --service $ECS_SERVICE --force-new-deployment
  dependencies:
    - build

Trying to build a node.js app, have the artifact pushed to docker to build an image and then deployed to AWS with Terraform (I'll integrate the Terraform part later). After toiling to get the right config for the gitlab-ci file, this is one brick wall I can't seem to get past.

This is the error I get:

$ echo $AWS_ACCESS_KEY_ID > ~/.aws/credentials
$ echo $AWS_SECRET_ACCESS_KEY >> ~/.aws/credentials
$ docker info
Client:
 Debug Mode: false
Server:
ERROR: Cannot connect to the Docker daemon at [MASKED]. Is the docker daemon running?
errors pretty printing info
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1

I've added both ec2-user and gitlab-runner to docker group and successfully ran docker run hello-world on both.

sudo service docker status says it's running but sudo service --status-all gives this output:

● cfn-hup.service - SYSV: Runs user-specified actions when a
   Loaded: loaded (/etc/rc.d/init.d/cfn-hup; bad; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:systemd-sysv-generator(8)
netconsole module not loaded
Configured devices:
lo eth0
Currently active devices:
lo eth0 docker0

The sudo systemctl status docker.socket also says 'active'.

Here's my /etc/gitlab-runner/config.toml


[[runners]]
  name = "My Runner"
  url = "https://gitlab.com/"
  id = 0
  token = REDACTED
  token_obtained_at = 0001-01-01T00:00:00Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker"
  [runners.cache]
    MaxUploadedArchiveSize = 0
  [runners.docker]
    tls_verify = false
    image = "ubuntu:latest"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/var/run/docker.sock:/var/run/docker.sock","/opt/gitlab-runner/cache:/cache:rw"]
    shm_size = 0

[[runners]]
  name = "Runner on AWS EC2"
  url = "https://gitlab.com/"

Feel like I'm going in circles at this point. I'd appreciate any suggestions.

codestein
  • 59
  • 6
  • Hmm. In your job, can you even reach `docker`, e.g. `ping docker` works? If so does connecting with netcat or similar? – declension Jun 01 '23 at 08:12
  • @declension tried each in the last job. docker version checks out but I tried $ nc -zv docker 2375 and it returned nc: bad address 'docker'. Same response when I run ```nc -zv /var/lib/docker 2375``` – codestein Jun 01 '23 at 09:17
  • You're mounting the socket. You shouldn't be trying to connect to a remote daemon. Why is your Docker daemon address masked? It should just be the Unix socket. If you are setting `DOCKER_HOST` somewhere, you should remove it. – sytech Jun 01 '23 at 09:29
  • @sytech so ideally, the last line in the docker job should look like this? ```docker run --rm -v /var/run/docker.sock:/var/run/docker.sock amazon/aws-cli ecs update-service --cluster $ECS_CLUSTER --service $ECS_SERVICE --force-new-deployment``` – codestein Jun 01 '23 at 16:46
  • @sytech also, should I be hosting the runner on my windows laptop instead of an EC2 instance? Or is it better to use shared runners? (thought those were pretty limited) Sorry, I'm pretty new to this. Still learning. – codestein Jun 01 '23 at 17:24
  • The mount is specified in your runner config, already there. The job script can simply call `docker` as you would normally; no need to do anything special in the job itself. The problem you're having is that the job seems to be trying to use TCP to connect to a docker daemon, which it shouldn't do unless `DOCKER_HOST` has been set. Just unset the `DOCKER_HOST` variable in the job and it should use the socket that has already been mounted. If you're mounting the socket in the runner config, you also should not use the `docker:dind` service. – sytech Jun 01 '23 at 20:05
  • Alternatively, just remove the docker socket from the volume mounts in the config. Then you can use the `docker:dind` service as your daemon. You should also verify `docker` commands work as expected on the host and you're running the gitlab-runner as a user with permission to run docker commands. – sytech Jun 01 '23 at 20:10
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/253924/discussion-between-codestein-and-sytech). – codestein Jun 01 '23 at 20:28

1 Answers1

0

I'm not sure this is the only / root problem but I'd say with all newer version of Docker, you want to connect with TLS rather than disabling it. The DIND docs have some helpful info on the flags etc.

In fact the Gitlab CI FAQ lists it as a reason for that error.

declension
  • 4,110
  • 22
  • 25
  • Tried this and I still arrive at the same error. in other words, I specified ```docker:20.10.16``` and included the TLS cert in the variables. Still same error. – codestein Jun 01 '23 at 20:52
  • As per the @systech thread I think you'll need to remove the mounting of the socket directly too, that is deprecated and slightly dodgy unless you really need it. – declension Jun 02 '23 at 12:15
  • Also (but later) why not install `aws-cli` instead of running that from DIND, to make life simpler for _that_ bit? – declension Jun 02 '23 at 12:43
  • I'm using an Amazon Linux 2 machine. I thought those already come with ```aws-cli```? I could be wrong. – codestein Jun 03 '23 at 02:49
  • On the _host_ yes, but remember all Gitlab jobs are running in Docker containers. – declension Jun 06 '23 at 08:20