cAdvisor - sync between nodes

Question

I have a docker swarm running our business stack defined in a docker-compose.yml on two servers (nodes). The docker-compose has defined cAdvisor starting on each of the two nodes like that:

  cadvisor:
    image: gcr.io/google-containers/cadvisor:latest
    command: "--logtostderr --housekeeping_interval=30s"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /:/rootfs:ro
      - /var/run:/var/run
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk:/dev/disk/:ro
    ports:
      - "9338:8080"
    deploy:
      mode: global
      resources:
        limits:
          memory: 128M
        reservations:
          memory: 64M

On a third server I run a docker separately from the docker swarm on node 1 and 2 and this server is used to run Prometheus and Grafana. Prometheus is configured to scrape only the node1:9338 resource to get the cAdvisor information.

I occasionally get the problem that when scraping node1:9338 not all containers running on both nodes 1 and 2 are shown in the cAdvisor statistics.

I was assuming that cAdvisor is synching its information in the swarm so that I'm able to configure Prometheus to only use node1:9338 as entrypoint into the docker swarm and scraping the information.

Or do I have to also put node2:9338 into my Prometheus configuration to always get all information of all nodes? If yes, how should this scale then because I would need to add each new node to the Prometheus config.

Running Prometheus together with the business stack in one swarm is no option.

edit: I experienced today when opening the cAdvisor metrics URL http://node1:9338/metrics as well as http://node2:9338/metrics a strange behaviour as I see the same information of all containers running on node1 on both URLs. The information of the containers running on node2 are missing when requesting http://node2:9338/metrics.

Could it be that the docker-internal load balancing is routing the request from http://node2:9338/metrics to the node1:9338 cAdvisor so the metrics of node1 are shown despite node2 is requested?

score 1 · Answer 1 · answered Jun 30 '20 at 08:01

1

cAdvisor looks at the container information provided by Linux on that machine, it knows nothing of Swarm. You'll want to have Prometheus scraping all your machines.

answered Jun 30 '20 at 08:01

brian-brazil

3,952
1
22
16

So cAdvisor isn't syncing between the nodes by itself, having all information of the whole swarm available on each node? How does this scale then? Do I have to update the prometheus config each time a new node joins the swarm? – mr.simonski Jun 30 '20 at 11:42
Today I manually opended the cAdvisor /metrics URL on node1 and node2 and I was very suprised that I saw on both nodes the metrics of the container of node1. So the cAdvisor running on node2 was aware of the containers running on node1. Somehow there must be a connection between them. – mr.simonski Jul 13 '20 at 15:01

score 0 · Accepted Answer · answered Jul 15 '20 at 08:25

Indeed the problem was the docker-internal load balancing in swarm mode.

As I wrote in my initial post we were adding cAdvisor to our docker-compose file and we were instantiating the docker-swarm via

docker stack deploy --prune --with-registry-auth -c docker-compose.yml MY_STACK

The configuration of cAdvisor with

deploy:
      mode: global

leads to one instance per node but requesting a certain node via http://node2:9338/metrics doesn't mean you get the result of cAdvisor running on that node. The internal docker network might reroute your request to http://node1:9338/metrics so that you won't be able to scrape the real cAdvisor results from node2.

The solution which worked for me was to explicit tell docker to use mode: host in the ports section of cAdvisor in my docker-compose. My final config looks like:

 cadvisor:
    image: gcr.io/google-containers/cadvisor:latest
    command: "--logtostderr --housekeeping_interval=30s"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /:/rootfs:ro
      - /var/run:/var/run
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk:/dev/disk/:ro
    ports:
      - target: 8080
        published: 9338
        protocol: tcp
        mode: host
    deploy:
      mode: global
      resources:
        limits:
          cpus: "1"
          memory: 128M
        reservations:
          memory: 64M

Please notice the changed ports section.

cAdvisor - sync between nodes

2 Answers2