Current setup
Hello. I'm using a Docker Registry helm chart deployed with an S3 storage. Now I would like to update (change) the way the live/readiness probes work. The reason being that after one day of use, I have depleted the free tier monthly quota of LIST requests on AWS. This quota is 2000 requests/mo. Right now, the probes on the registry pod look like this:
Liveness: http-get http://:5000/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:5000/ delay=0s timeout=1s period=10s #success=1 #failure=3
These requests are obviously GET
. However, as per this answer, these requests are tagged as LIST
by AWS.
These are the custom values (chart_values.yaml) I have used in the helm chart Docker Registry installation:
storage: s3
secrets:
htpasswd: "..."
s3:
accessKey: "..."
secretKey: "..."
s3:
region: "..."
secure: true
bucket: "..."
Pushing / pulling of images works as expected.
Question (see the latest edit for the rephrased question)
What should I do to avoid getting the S3 queried by the probes?Shouldn't the liveness/readiness checks be related only to the pod itself without touching the S3?
I know I can edit the deployment config to change the periodSeconds
of the probes, to let's say 600s
. But I don't think that is the optimal solution. I'm aware that liveness commands exist, but I'm not sure if this would be possible with the default registry docker image.
Last thing I thought of was that maybe if the registry docker image had prometheus metrics enabled, I would be able to change the probes to the :5001/metrics
path. But I'm not really sure how to do that.
EDIT:
To enable the prometheus metrics I have removed my previous helm installation of docker registry. Then downloaded the stable docker reigstry helm chart via helm pull stable/docker-registry --untar
.
Then I have edited the templates/deployment.yaml file:
spec:
containers:
ports:
- containerPort: 5000
- containerPort: 5001 # Added this line
livenessProbe:
initialDelaySeconds: 1 # Added
path: /metrics # Edited
port: 5001 # Edited
readinessProbe:
initialDelaySeconds: 10 # Added
path: /metrics # Edited
port: 5001 # Edited
env:
# Added these env variables
- name: REGISTRY_HTTP_DEBUG_ADDR
value: "localhost:5001"
- name: REGISTRY_HTTP_DEBUG_PROMETHEUS_ENABLED
value: "true"
- name: REGISTRY_HTTP_DEBUG_PROMETHEUS_PATH
value: /metrics
And templates/service.yaml file:
ports:
- port: {{ .Values.service.port }}
protocol: TCP
name: {{ .Values.service.name }}
targetPort: 5000
# Added these lines below
- port: 5001
protocol: TCP
name: {{ .Values.service.name }}-prometheus
targetPort: 5001
Lint and install:
helm install registry ./docker-registry-chart/ -f chart_values.yaml -n docker-registry
However, the registry pod is never ready with this configuration (kubectl get
shows 0/1 on the pod). This is due to the readiness probe failing because the 5001 containerPort doesn't get exposed. Thus, the readiness probe fails without being able to reach the metrics server.
I can confirm that the metrics server in the Docker container starts up properly. Here are the registry pod logs that show that the debug (metrics) server is up:
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT"
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5000_TCP"
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5000_TCP_ADDR"
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5000_TCP_PORT"
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5000_TCP_PROTO"
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5001_TCP"
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5001_TCP_ADDR"
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5001_TCP_PORT"
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5001_TCP_PROTO"
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_SERVICE_HOST"
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_SERVICE_PORT"
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_SERVICE_PORT_REGISTRY"
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_SERVICE_PORT_REGISTRY_PROMETHEUS"
time="2020-04-10T14:36:26.172115809Z" level=info msg="debug server listening localhost:5001"
time="2020-04-10T14:36:26.188154917Z" level=info msg="redis not configured" go.version=go1.11.2 instance.id=fc945824-3600-4343-8a18-75a20b07f695 service=registry version=v2.7.1
time="2020-04-10T14:36:26.194453749Z" level=info msg="Starting upload purge in 29m0s" go.version=go1.11.2 instance.id=fc945824-3600-4343-8a18-75a20b07f695 service=registry version=v2.7.1
time="2020-04-10T14:36:26.211140816Z" level=info msg="using inmemory blob descriptor cache" go.version=go1.11.2 instance.id=fc945824-3600-4343-8a18-75a20b07f695 service=registry version=v2.7.1
time="2020-04-10T14:36:26.211497166Z" level=info msg="providing prometheus metrics on /metrics"
time="2020-04-10T14:36:26.211894294Z" level=info msg="listening on [::]:5000" go.version=go1.11.2 instance.id=fc945824-3600-4343-8a18-75a20b07f695 service=registry version=v2.7.1
I can even exec into the docker container and curl localhost:5001/metrics
which results in 200 with the appropriate prometheus data.
But I'm still not sure how to expose the 5001 port on the container. I believe this would allow me to use metrics with the probes like @mdaniel mentions in his answer.
EDIT 2:
kubectl port-forward <registry_pod> 5001
Portforwarding the registry pod works and I can curl localhost:5001/metrics
to get the prometheus metrics data. curl
is executed from the cluster.
I'm wondering if there is something wrong with my templates/service.yaml file..?
EDIT 3: I have figured out what the problem was. The inaccessible service on port 5001
was caused by improperly setting the REGISTRY_HTTP_DEBUG_ADDR
to localhost:5001
. The value should be :5001
.
Finally, to translate this into how your templates/deployment.yaml should look like:
spec:
containers:
ports:
- containerPort: 5000
- containerPort: 5001 # Added this line
livenessProbe:
initialDelaySeconds: 1 # Added
path: /metrics # Edited
port: 5001 # Edited
readinessProbe:
initialDelaySeconds: 10 # Added
path: /metrics # Edited
port: 5001 # Edited
env:
# Added these env variables
- name: REGISTRY_HTTP_DEBUG_ADDR
value: ":5001" # Make sure the addr field is left empty!
- name: REGISTRY_HTTP_DEBUG_PROMETHEUS_ENABLED
value: "true"
- name: REGISTRY_HTTP_DEBUG_PROMETHEUS_PATH
value: /metrics
Potentially you could also supply the environment variables through the chart_values.yaml file with the configData
section (configData.http.debug.addr
etc.).
Either way, I have decided to post the "answer" as an edit as opposed to a regular SO answer. The original question is still unanswered.
To rephrase the original question:
- Shouldn't the liveness/readiness checks be related only to the pod itself without accessing the S3? The S3 health check should be customizable with the storagedriver on the registry container. To me it seems like the registry is a separate entity almost unrelated to the S3. Essentially we want to health check that entity, not the data storage which has a separate health check...