5

I have single host docker swarm application set to global mode (in order to have only 1 replica of each service). For some reason after updating the swarm some of the services are showing 2/2 replicas. It looks like the old container wasn't stopped after the new one started. What I have found is that it happens when mysql container is being replaced (and it's the only service that has order: stop-first inside update config). The services that tend to get too many replicas are dependent on the DB and on deploy they are failing until DB is ready (but for some reason at this point there are two replicas of these - old and new one). To fix this I need to run the deploy again.

My env is deployed by CI/CD which does it in order:

  1. docker-compose -f build-images.yml build
  2. docker-compose -f build-images.yml push to private docker registry (also on the same host and swarm)
  3. docker image prune -a
  4. docker stack deploy -c test-swarm.yml test

Now I actually have 2 problems:

Firstly mysql most of the time is being updated even though nothing has changed in the code. It builds new image (which is understandable since I did image prune -a), then it is pushed to registry for some reason as new layer and then it replaces the old mysql container with exact same one. This behavior causes that almost everytime I change any other service, the problem with replicas I described above appears.

Secondly old replica of container stays even when new one is created and running when DB is being updated, making too many replicas (and the old version gets all the action like api calls).

There is part of my test-swarm.yml with DB and one of the services that get duplicated:

services:
  #BACKEND
  db:
    image: registry.address/db:latest
    user: "${UID}:${GID}"
    deploy:
      mode: global
      update_config:
        failure_action: pause
        order: stop-first
    healthcheck:
      test: [ "CMD-SHELL", "mysqladmin --defaults-file=/home/.my.cnf -u root status || exit 1" ]
      interval: 60s
      timeout: 5s
      retries: 3
      start_period: 30s
    ports:
      - 3319:3306
    env_file:
      - prod-env/db.env
    volumes:
      - db:/var/lib/mysql
    networks:
      - test-backend

  core:
    image: registry.address/core:latest
    user: "${UID}:${GID}"
    deploy:
      mode: global
      update_config:
        failure_action: pause
        order: start-first
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost/api/admin/status || exit 1"]
      interval: 60s
      timeout: 5s
      retries: 5
      start_period: 30s
    depends_on:
      - db
    networks:
      - test-backend
      - test-api
    environment:
      - ASPNETCORE_ENVIRONMENT=Docker
    volumes:
      - app-data:/src/app/files

and there is part of the build-images.yml with these services:

services:
  db:
    image: registry.address/db:latest
    build:
      context: .
      dockerfile: db-prod.Dockerfile
      args:
        UID: ${UID}
        GID: ${GID}

  core:
    image: registry.address/core:latest
    build:
      context: .
      dockerfile: Core/Dockerfile
      args:
        UID: ${UID}
        GID: ${GID}

DB dockerfile:

FROM mysql:latest
ARG UID
ARG GID
COPY ./init/.my.cnf /home/
RUN chown $UID:$GID /home/.my.cnf
COPY ./init/01-databases.sql /docker-entrypoint-initdb.d/
USER $UID:$GID
RUN chmod 600 /home/.my.cnf
Pepsko
  • 63
  • 6

0 Answers0