I'm quite new to Netdata and also Docker Swarm. I ran Netdata for a while on single hosts but now trying to stream Netdata from workers to a manager node in a swarm environment where the manager also should act as a central Netdata instance. I'm aiming to only monitor the data from the manager.
Here's my compose file for the stack:
version: '3.2'
services:
netdata-client:
image: titpetric/netdata
hostname: "{{.Node.Hostname}}"
cap_add:
- SYS_PTRACE
security_opt:
- apparmor:unconfined
environment:
- NETDATA_STREAM_DESTINATION=control:19999
- NETDATA_STREAM_API_KEY=1x214ch15h3at1289y
- PGID=999
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /var/run/docker.sock:/var/run/docker.sock
networks:
- netdata
deploy:
mode: global
placement:
constraints: [node.role == worker]
netdata-central:
image: titpetric/netdata
hostname: control
cap_add:
- SYS_PTRACE
security_opt:
- apparmor:unconfined
environment:
- NETDATA_API_KEY_ENABLE_1x214ch15h3at1289y=1
ports:
- '19999:19999'
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /var/run/docker.sock:/var/run/docker.sock
networks:
- netdata
deploy:
mode: replicated
replicas: 1
placement:
constraints: [node.role == manager]
networks:
netdata:
driver: overlay
attachable: true
Netdata on the manager works fine and the container runs on the one worker node I'm testing on. According to log output it seems to run well and gathers names from the docker containers running as it does in a local environment.
Problem is that it can't connect to the netdata-central service running on the manager.
This is the error message:
2019-01-04 08:35:28: netdata INFO : STREAM_SENDER[7] : STREAM 7 [send to control:19999]: connecting...,
2019-01-04 08:35:28: netdata ERROR : STREAM_SENDER[7] : Cannot resolve host 'control', port '19999': Name or service not known,
not sure why it can't resolve the hostname, thought it should work that way on the overlay network. Maybe there's a better way to connect and not rely on the hostname?
Any help is appreciated.
EDIT: as this question might come up - the firewall (ufw) on the control host is inactive, also I think the error message clearly points to a problem with name resolution.