I have an Elasticsearch cluster (6.3) running on Kubernetes (GKE) with the following manifest file:
---
# Source: elasticsearch/templates/manifests.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: elasticsearch-configmap
labels:
app.kubernetes.io/name: "elasticsearch"
app.kubernetes.io/component: elasticsearch-server
data:
elasticsearch.yml: |
cluster.name: "${CLUSTER_NAME}"
node.name: "${NODE_NAME}"
path.data: /usr/share/elasticsearch/data
path.repo: ["${BACKUP_REPO_PATH}"]
network.host: 0.0.0.0
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.unicast.hosts: ${DISCOVERY_SERVICE}
log4j2.properties: |
status = error
appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%m%n
rootLogger.level = info
rootLogger.appenderRef.console.ref = console
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
labels: &ElasticsearchDeploymentLabels
app.kubernetes.io/name: "elasticsearch"
app.kubernetes.io/component: elasticsearch-server
spec:
selector:
matchLabels: *ElasticsearchDeploymentLabels
serviceName: elasticsearch-svc
replicas: 2
updateStrategy:
# The procedure for updating the Elasticsearch cluster is described at
# https://www.elastic.co/guide/en/elasticsearch/reference/current/rolling-upgrades.html
type: OnDelete
template:
metadata:
labels: *ElasticsearchDeploymentLabels
spec:
terminationGracePeriodSeconds: 180
initContainers:
# This init container sets the appropriate limits for mmap counts on the hosting node.
# https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html
- name: set-max-map-count
image: marketplace.gcr.io/google/elasticsearch/ubuntu16_04@...
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
command:
- /bin/bash
- -c
- 'if [[ "$(sysctl vm.max_map_count --values)" -lt 262144 ]]; then sysctl -w vm.max_map_count=262144; fi'
containers:
- name: elasticsearch
image: eu.gcr.io/projectId/elasticsearch6.3@sha256:...
imagePullPolicy: Always
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: CLUSTER_NAME
value: "elasticsearch-cluster"
- name: DISCOVERY_SERVICE
value: "elasticsearch-svc"
- name: BACKUP_REPO_PATH
value: ""
ports:
- name: prometheus
containerPort: 9114
protocol: TCP
- name: http
containerPort: 9200
- name: tcp-transport
containerPort: 9300
volumeMounts:
- name: configmap
mountPath: /etc/elasticsearch/elasticsearch.yml
subPath: elasticsearch.yml
- name: configmap
mountPath: /etc/elasticsearch/log4j2.properties
subPath: log4j2.properties
- name: elasticsearch-pvc
mountPath: /usr/share/elasticsearch/data
readinessProbe:
httpGet:
path: /_cluster/health?local=true
port: 9200
initialDelaySeconds: 5
livenessProbe:
exec:
command:
- /usr/bin/pgrep
- -x
- "java"
initialDelaySeconds: 5
resources:
requests:
memory: "2Gi"
- name: prometheus-to-sd
image: marketplace.gcr.io/google/elasticsearch/prometheus-to-sd@sha256:8e3679a6e059d1806daae335ab08b304fd1d8d35cdff457baded7306b5af9ba5
ports:
- name: profiler
containerPort: 6060
command:
- /monitor
- --stackdriver-prefix=custom.googleapis.com
- --source=elasticsearch:http://localhost:9114/metrics
- --pod-id=$(POD_NAME)
- --namespace-id=$(POD_NAMESPACE)
- --monitored-resource-types=k8s
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumes:
- name: configmap
configMap:
name: "elasticsearch-configmap"
volumeClaimTemplates:
- metadata:
name: elasticsearch-pvc
labels:
app.kubernetes.io/name: "elasticsearch"
app.kubernetes.io/component: elasticsearch-server
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: standard
resources:
requests:
storage: 50Gi
---
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-prometheus-svc
labels:
app.kubernetes.io/name: elasticsearch
app.kubernetes.io/component: elasticsearch-server
spec:
clusterIP: None
ports:
- name: prometheus-port
port: 9114
protocol: TCP
selector:
app.kubernetes.io/name: elasticsearch
app.kubernetes.io/component: elasticsearch-server
---
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-svc-internal
labels:
app.kubernetes.io/name: "elasticsearch"
app.kubernetes.io/component: elasticsearch-server
spec:
ports:
- name: http
port: 9200
- name: tcp-transport
port: 9300
selector:
app.kubernetes.io/name: "elasticsearch"
app.kubernetes.io/component: elasticsearch-server
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
name: ilb-service-elastic
annotations:
cloud.google.com/load-balancer-type: "Internal"
labels:
app: elasticsearch-svc
spec:
type: LoadBalancer
loadBalancerIP: some-ip-address
selector:
app.kubernetes.io/component: elasticsearch-server
app.kubernetes.io/name: elasticsearch
ports:
- port: 9200
protocol: TCP
This manifest was written from the template that used to be available on the GCP marketplace.
I'm encountering the following issue: the cluster is supposed to have 2 nodes, and indeed 2 pods are running. However
- a call to ip:9200/_nodes returns just one node
- there still seems to be a second node running that receives traffic (at least, read traffic), as visible in the logs. Those requests typically fail because the requested entities don't exist on that node (just on the master node).
I can't wrap my head around the fact that the node at the same time isn't visible to the master node, and receives read traffic from the load balanced pointing to the stateful set.
Am I missing something subtle ?