Kubernetes: Issues with liveness / readiness probe on S3 storage hosted Docker Registry

Question

Current setup

Hello. I'm using a Docker Registry helm chart deployed with an S3 storage. Now I would like to update (change) the way the live/readiness probes work. The reason being that after one day of use, I have depleted the free tier monthly quota of LIST requests on AWS. This quota is 2000 requests/mo. Right now, the probes on the registry pod look like this:

Liveness:       http-get http://:5000/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness:      http-get http://:5000/ delay=0s timeout=1s period=10s #success=1 #failure=3

These requests are obviously GET. However, as per this answer, these requests are tagged as LIST by AWS.

These are the custom values (chart_values.yaml) I have used in the helm chart Docker Registry installation:

storage: s3

secrets:
  htpasswd: "..."
  s3:
    accessKey: "..."
    secretKey: "..."

s3:
  region: "..."
  secure: true
  bucket: "..."

Pushing / pulling of images works as expected.

Question (see the latest edit for the rephrased question)

~~What should I do to avoid getting the S3 queried by the probes?~~
~~Shouldn't the liveness/readiness checks be related only to the pod itself without touching the S3?~~

I know I can edit the deployment config to change the periodSeconds of the probes, to let's say 600s. But I don't think that is the optimal solution. I'm aware that liveness commands exist, but I'm not sure if this would be possible with the default registry docker image.

Last thing I thought of was that maybe if the registry docker image had prometheus metrics enabled, I would be able to change the probes to the :5001/metrics path. But I'm not really sure how to do that.

EDIT:

To enable the prometheus metrics I have removed my previous helm installation of docker registry. Then downloaded the stable docker reigstry helm chart via helm pull stable/docker-registry --untar.

Then I have edited the templates/deployment.yaml file:

spec:
  containers:
    ports:
      - containerPort: 5000
      - containerPort: 5001     # Added this line
    livenessProbe:
      initialDelaySeconds: 1    # Added
      path: /metrics            # Edited
      port: 5001                # Edited
    readinessProbe:
      initialDelaySeconds: 10   # Added
      path: /metrics            # Edited
      port: 5001                # Edited
    env:
      # Added these env variables
      - name: REGISTRY_HTTP_DEBUG_ADDR
        value: "localhost:5001"
      - name: REGISTRY_HTTP_DEBUG_PROMETHEUS_ENABLED
        value: "true"
      - name: REGISTRY_HTTP_DEBUG_PROMETHEUS_PATH
        value: /metrics

And templates/service.yaml file:

  ports:
    - port: {{ .Values.service.port }}
      protocol: TCP
      name: {{ .Values.service.name }}
      targetPort: 5000
    # Added these lines below
    - port: 5001
      protocol: TCP
      name: {{ .Values.service.name }}-prometheus
      targetPort: 5001

Lint and install:

helm install registry ./docker-registry-chart/ -f chart_values.yaml -n docker-registry

However, the registry pod is never ready with this configuration (kubectl get shows 0/1 on the pod). This is due to the readiness probe failing because the 5001 containerPort doesn't get exposed. Thus, the readiness probe fails without being able to reach the metrics server.

I can confirm that the metrics server in the Docker container starts up properly. Here are the registry pod logs that show that the debug (metrics) server is up:

time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT" 
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5000_TCP" 
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5000_TCP_ADDR" 
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5000_TCP_PORT" 
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5000_TCP_PROTO" 
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5001_TCP" 
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5001_TCP_ADDR" 
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5001_TCP_PORT" 
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_PORT_5001_TCP_PROTO" 
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_SERVICE_HOST" 
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_SERVICE_PORT" 
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_SERVICE_PORT_REGISTRY" 
time="2020-04-10T14:36:26Z" level=warning msg="Ignoring unrecognized environment variable REGISTRY_DOCKER_REGISTRY_SERVICE_PORT_REGISTRY_PROMETHEUS" 
time="2020-04-10T14:36:26.172115809Z" level=info msg="debug server listening localhost:5001" 
time="2020-04-10T14:36:26.188154917Z" level=info msg="redis not configured" go.version=go1.11.2 instance.id=fc945824-3600-4343-8a18-75a20b07f695 service=registry version=v2.7.1 
time="2020-04-10T14:36:26.194453749Z" level=info msg="Starting upload purge in 29m0s" go.version=go1.11.2 instance.id=fc945824-3600-4343-8a18-75a20b07f695 service=registry version=v2.7.1 
time="2020-04-10T14:36:26.211140816Z" level=info msg="using inmemory blob descriptor cache" go.version=go1.11.2 instance.id=fc945824-3600-4343-8a18-75a20b07f695 service=registry version=v2.7.1 
time="2020-04-10T14:36:26.211497166Z" level=info msg="providing prometheus metrics on /metrics" 
time="2020-04-10T14:36:26.211894294Z" level=info msg="listening on [::]:5000" go.version=go1.11.2 instance.id=fc945824-3600-4343-8a18-75a20b07f695 service=registry version=v2.7.1

I can even exec into the docker container and curl localhost:5001/metrics which results in 200 with the appropriate prometheus data.

But I'm still not sure how to expose the 5001 port on the container. I believe this would allow me to use metrics with the probes like @mdaniel mentions in his answer.

EDIT 2:

kubectl port-forward <registry_pod> 5001

Portforwarding the registry pod works and I can curl localhost:5001/metrics to get the prometheus metrics data. curl is executed from the cluster.

I'm wondering if there is something wrong with my templates/service.yaml file..?

EDIT 3: I have figured out what the problem was. The inaccessible service on port 5001 was caused by improperly setting the REGISTRY_HTTP_DEBUG_ADDR to localhost:5001. The value should be :5001.

Finally, to translate this into how your templates/deployment.yaml should look like:

spec:
  containers:
    ports:
      - containerPort: 5000
      - containerPort: 5001     # Added this line
    livenessProbe:
      initialDelaySeconds: 1    # Added
      path: /metrics            # Edited
      port: 5001                # Edited
    readinessProbe:
      initialDelaySeconds: 10   # Added
      path: /metrics            # Edited
      port: 5001                # Edited
    env:
      # Added these env variables
      - name: REGISTRY_HTTP_DEBUG_ADDR
        value: ":5001"          # Make sure the addr field is left empty!
      - name: REGISTRY_HTTP_DEBUG_PROMETHEUS_ENABLED
        value: "true"
      - name: REGISTRY_HTTP_DEBUG_PROMETHEUS_PATH
        value: /metrics

Potentially you could also supply the environment variables through the chart_values.yaml file with the configData section (configData.http.debug.addr etc.).

Either way, I have decided to post the "answer" as an edit as opposed to a regular SO answer. The original question is still unanswered.

To rephrase the original question:

Shouldn't the liveness/readiness checks be related only to the pod itself without accessing the S3? The S3 health check should be customizable with the storagedriver on the registry container. To me it seems like the registry is a separate entity almost unrelated to the S3. Essentially we want to health check that entity, not the data storage which has a separate health check...

score 0 · Answer 1 · answered Apr 10 '20 at 03:16

I would be able to change the probes to the :5001/metrics path. But I'm not really sure how to do that

Since you didn't specify how you installed anything, no one can give you the specific commands, but I checked the docker-registry helm chart and it appears they do not allow customization of the liveness and readiness probes :-(

So you will have to use kubectl edit deploy $registry_deploy where $registry_deploy is the Deployment name that is running registry, then update the two probes blocks to specify what you want.

Just be aware that while the /metrics endpoint is a perfectly fine liveness probe, it actually is doing the correct thing by poking S3 during readiness probe since the Pod should not be serving traffic if it can't correctly connect to S3 with the credentials it has available to it.

Hey. I've tried to update the probes to query `:5001/metrics` instead of `:5000/` and also updated my question with an edit. Care to take a look? Thanks :) — David, Apr 10 '20 at 14:50

Kubernetes: Issues with liveness / readiness probe on S3 storage hosted Docker Registry

Current setup

Question (see the latest edit for the rephrased question)

1 Answers1