I'm running an ELK stack on a T4g.medium box (arm & 4GB ram) on AWS. When using the official Kibana image I see weird behaviour where after approx 4 hours running the CPU spikes (50-60%) and the EC2 box becomes unreachable until restarted. 1 out of 2 status checks fail also. Once restarted it runs for another 4 or so hours then the same happens again. The instance is not under heavy load and it goes down in the middle of the night when there is no load. I'm 99.9% its Kibana causing the issue as gagara/kibana-oss-arm64:7.6.2
has ran for months without issue. Its not an ARM issue or Kibana 7.13 either as I've encountered the same with x86 on older versions of Kibana. Mu config is:
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.13.0
configs:
- source: elastic_config
target: /usr/share/elasticsearch/config/elasticsearch.yml
environment:
ES_JAVA_OPTS: "-Xmx2g -Xms2g"
networks:
- internal
volumes:
- /mnt/data/elasticsearch:/usr/share/elasticsearch/data
deploy:
mode: replicated
replicas: 1
logstash:
image: docker.elastic.co/logstash/logstash:7.13.0
ports:
- "5044:5044"
- "9600:9600"
configs:
- source: logstash_config
target: /usr/share/logstash/config/logstash.yml
- source: logstash_pipeline
target: /usr/share/logstash/pipeline/logstash.conf
environment:
LS_JAVA_OPTS: "-Xmx1g -Xms1g"
networks:
- internal
deploy:
mode: replicated
replicas: 1
kibana:
image: docker.elastic.co/kibana/kibana:7.13.0
configs:
- source: kibana_config
target: /usr/share/kibana/config/kibana.yml
environment:
NODE_OPTIONS: "--max-old-space-size=300"
networks:
- internal
deploy:
mode: replicated
replicas: 1
labels:
- "traefik.enable=true"
load-balancer:
image: traefik:v2.2.8
ports:
- 5601:443
configs:
- source: traefik_config
target: /etc/traefik/traefik.toml
volumes:
- /var/run/docker.sock:/var/run/docker.sock
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 1
networks:
- internal
configs:
elastic_config:
file: ./config/elasticsearch.yml
logstash_config:
file: ./config/logstash/logstash.yml
logstash_pipeline:
file: ./config/logstash/pipeline/pipeline.conf
kibana_config:
file: ./config/kibana.yml
traefik_config:
file: ./config/traefik.toml
networks:
internal:
driver: overlay
And I've disabled a pile of stuff in kibana.yml to see if that helped:
server.name: kibana
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://elasticsearch:9200"]
xpack.monitoring.ui.enabled: false
xpack.graph.enabled: false
xpack.infra.enabled: false
xpack.canvas.enabled: false
xpack.ml.enabled: false
xpack.uptime.enabled: false
xpack.maps.enabled: false
xpack.apm.enabled: false
timelion.enabled: false
Has anyone encountered similar problems with a single node ELK stack running on Docker?