I have single host docker swarm application set to global mode (in order to have only 1 replica of each service). For some reason after updating the swarm some of the services are showing 2/2 replicas. It looks like the old container wasn't stopped after the new one started.
What I have found is that it happens when mysql container is being replaced (and it's the only service that has order: stop-first
inside update config). The services that tend to get too many replicas are dependent on the DB and on deploy they are failing until DB is ready (but for some reason at this point there are two replicas of these - old and new one). To fix this I need to run the deploy again.
My env is deployed by CI/CD which does it in order:
- docker-compose -f build-images.yml build
- docker-compose -f build-images.yml push to private docker registry (also on the same host and swarm)
- docker image prune -a
- docker stack deploy -c test-swarm.yml test
Now I actually have 2 problems:
Firstly mysql most of the time is being updated even though nothing has changed in the code. It builds new image (which is understandable since I did image prune -a), then it is pushed to registry for some reason as new layer and then it replaces the old mysql container with exact same one. This behavior causes that almost everytime I change any other service, the problem with replicas I described above appears.
Secondly old replica of container stays even when new one is created and running when DB is being updated, making too many replicas (and the old version gets all the action like api calls).
There is part of my test-swarm.yml with DB and one of the services that get duplicated:
services:
#BACKEND
db:
image: registry.address/db:latest
user: "${UID}:${GID}"
deploy:
mode: global
update_config:
failure_action: pause
order: stop-first
healthcheck:
test: [ "CMD-SHELL", "mysqladmin --defaults-file=/home/.my.cnf -u root status || exit 1" ]
interval: 60s
timeout: 5s
retries: 3
start_period: 30s
ports:
- 3319:3306
env_file:
- prod-env/db.env
volumes:
- db:/var/lib/mysql
networks:
- test-backend
core:
image: registry.address/core:latest
user: "${UID}:${GID}"
deploy:
mode: global
update_config:
failure_action: pause
order: start-first
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost/api/admin/status || exit 1"]
interval: 60s
timeout: 5s
retries: 5
start_period: 30s
depends_on:
- db
networks:
- test-backend
- test-api
environment:
- ASPNETCORE_ENVIRONMENT=Docker
volumes:
- app-data:/src/app/files
and there is part of the build-images.yml with these services:
services:
db:
image: registry.address/db:latest
build:
context: .
dockerfile: db-prod.Dockerfile
args:
UID: ${UID}
GID: ${GID}
core:
image: registry.address/core:latest
build:
context: .
dockerfile: Core/Dockerfile
args:
UID: ${UID}
GID: ${GID}
DB dockerfile:
FROM mysql:latest
ARG UID
ARG GID
COPY ./init/.my.cnf /home/
RUN chown $UID:$GID /home/.my.cnf
COPY ./init/01-databases.sql /docker-entrypoint-initdb.d/
USER $UID:$GID
RUN chmod 600 /home/.my.cnf